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September 
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[AM D Public Use] Revision History 


AMDi¢1 


24594—Rev. 3.33—November 2021 AMD64 Technology 


Date 


Revision 


Description 


December 2011 


3.17 


Corrected second byte of VEX C5 escape sequence in 

Figure 1-2 on page 5. 

Made multiple corrections to the description of register-indirect 
addressing in Section 1.4 on page 17. 

Corrected mod field value in third row of Figure 1-16 on page 25. 
Updated pseudocode definition (see Section 2.5.3 on page 57). 
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September 2011 
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Preface 


About This Book 


This book is part of a multivolume work entitled the AMD64 Architecture Programmer’s Manual. This 
table lists each volume and its order number. 


Title Order No. 
Volume 1: Application Programming 24592 
Volume 2: System Programming 24593 
Volume 3: General-Purpose and System Instructions 24594 
Volume 4: 128-Bit and 256-Bit Media Instructions 26568 
Volume 5: 64-Bit Media and x87 Floating-Point Instructions 26569 


Audience 


This volume (Volume 3) is intended for all programmers writing application or system software for a 
processor that implements the AMD64 architecture. Descriptions of general-purpose instructions 
assume an understanding of the application-level programming topics described in Volume 1. 
Descriptions of system instructions assume an understanding of the system-level programming topics 
described in Volume 2. 


Organization 


Volumes 3, 4, and 5 describe the AMD64 architecture’s instruction set in detail. Together, they cover 
each instruction’s mnemonic syntax, opcodes, functions, affected flags, and possible exceptions. 

The AMD64 instruction set is divided into five subsets: 

¢ General-purpose instructions 

e System instructions 

e Streaming SIMD Extensions—SSE (includes 128-bit and 256-bit media instructions) 

¢ 64-bit media instructions (MMX™) 


¢ x87 floating-point instructions 
Several instructions belong to—and are described identically in—multiple instruction subsets. 


This volume describes the general-purpose and system instructions. The index at the end cross- 
references topics within this volume. For other topics relating to the AMD64 architecture, and for 
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information on instructions in other subsets, see the tables of contents and indexes of the other 
volumes. 


Conventions and Definitions 


The following section Notational Conventions describes notational conventions used in this volume 
and in the remaining volumes of this AMD64 Architecture Programmer’s Manual. This is followed 
by a Definitions section which lists a number of terms used in the manual along with their technical 
definitions. Finally, the Registers section lists the registers which are a part of the application 
programming model. 


Notational Conventions 
#GP(0) 
An instruction exception—in this example, a general-protection exception with error code of 0. 


1011b 
A binary value—in this example, a 4-bit value. 


FOEA_0B02h 
A hexadecimal value. Underscore characters may be inserted to improve readability. 


128 
Numbers without an alpha suffix are decimal unless the context indicates otherwise. 

7:4 
A bit range, from bit 7 to 4, inclusive. The high-order bit is shown first. Commas may be inserted 
to indicate gaps. 

CPUID FnxxxxX_XXXX_RRR[FieldName] 


Support for optional features or the value of an implementation-specific parameter of a processor 
can be discovered by executing the CPUID instruction on that processor. To obtain this value, 
software must execute the CPUID instruction with the function code XXXX_XXXXh in EAX and 
then examine the field FieldName returned in register RRR. If the “_RRR’” notation is followed by 
“_xYYY”, register ECX must be set to the value YYYh before executing CPUID. When FieldName 
is not given, the entire contents of register RRR contains the desired value. When determining 
optional feature support, if the bit identified by FieldName is set to a one, the feature is supported 
on that processor. 


CRO-CR4 


A register range, from register CRO through CR4, inclusive, with the low-order register first. 


CRO[PE], CRO.PE 
Notation for referring to a field within a register—in this case, the PE field of the CRO register. 
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CRO[PE] = 1, CRO.PE = 1 
Notation indicating that the PE bit of the CRO register has a value of 1. 


DS:rSI 


The contents of a memory location whose segment address is in the DS register and whose offset 
relative to that segment is in the rSI register. 


EFER[LME] = 0, EFER.LME = 0 
Notation indicating that the LME bit of the EFER register has a value of 0. 


RFLAGS[13:12] 


A field within a register identified by its bit range. In this example, corresponding to the IOPL 
field. 


Definitions 


Many of the following definitions assume an in-depth knowledge of the legacy x86 architecture. See 
“Related Documents” on page xxxvi for descriptions of the legacy x86 architecture. 


128-bit media instructions 


Instructions that operate on the various 128-bit vector data types. Supported within both the legacy 
SSE and extended SSE instruction sets. 


256-bit media instructions 


Instructions that operate on the various 256-bit vector data types. Supported within the extended 
SSE instruction set. 


64-bit media instructions 


Instructions that operate on the 64-bit vector data types. These are primarily a combination of 
MMX™ and 3DNow!™ instruction sets, with some additional instructions from the SSE1 and 
SSE2 instruction sets. 


16-bit mode 
Legacy mode or compatibility mode in which a 16-bit address size is active. See legacy mode and 
compatibility mode. 


32-bit mode 
Legacy mode or compatibility mode in which a 32-bit address size is active. See legacy mode and 
compatibility mode. 


64-bit mode 


Asubmode of long mode. In 64-bit mode, the default address size is 64 bits and new features, such 
as register extensions, are supported for system and application software. 


ee [AMD Public Use] “ 


AMD¢\ 
AMD64 Technology 24594—Rev. 3.33—November 2021 


absolute 


Said of a displacement that references the base of a code segment rather than an instruction pointer. 
Contrast with relative. 


biased exponent 


The sum of a floating-point value’s exponent and a constant bias for a particular floating-point data 
type. The bias makes the range of the biased exponent always positive, which allows reciprocation 
without overflow. 


byte 
Eight bits. 


clear 
To write a bit value of 0. Compare set. 


compatibility mode 
Asubmode of long mode. In compatibility mode, the default address size is 32 bits, and legacy 16- 
bit and 32-bit applications run without modification. 


commit 


To irreversibly write, in program order, an instruction’s result to software-visible storage, such as a 
register (including flags), the data cache, an internal write buffer, or memory. 


CPL 
Current privilege level. 


direct 


Referencing a memory location whose address is included in the instruction’s syntax as an 
immediate operand. The address may be an absolute or relative address. Compare indirect. 


dirty data 


Data held in the processor’s caches or internal buffers that is more recent than the copy held in 
main memory. 


displacement 


A signed value that is added to the base of a segment (absolute addressing) or an instruction pointer 
(relative addressing). Same as offset. 


doubleword 
Two words, or four bytes, or 32 bits. 


double quadword 
Eight words, or 16 bytes, or 128 bits. Also called octword. 
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effective address size 
The address size for the current instruction after accounting for the default address size and any 
address-size override prefix. 

effective operand size 
The operand size for the current instruction after accounting for the default operand size and any 
operand-size override prefix. 

element 
See vector. 


exception 
An abnormal condition that occurs as the result of executing an instruction. The processor’s 
response to an exception depends on the type of the exception. For all exceptions except 128-bit 
media SIMD floating-point exceptions and x87 floating-point exceptions, control is transferred to 
the handler (or service routine) for that exception, as defined by the exception’s vector. For 
floating-point exceptions defined by the IEEE 754 standard, there are both masked and unmasked 
responses. When unmasked, the exception handler is called, and when masked, a default response 
is provided instead of calling the handler. 

flush 
An often ambiguous term meaning (1) writeback, if modified, and invalidate, as in “flush the cache 
line,” or (2) invalidate, as in “flush the pipeline,” or (3) change a value, as in “flush to zero.” 

GDT 
Global descriptor table. 


IDT 
Interrupt descriptor table. 


IGN 


Ignored. Value written is ignored by hardware. Value returned on a read is indeterminate. See 
reserved. 


indirect 


Referencing a memory location whose address is in a register or other memory location. The 
address may be an absolute or relative address. Compare direct. 


IRB 


The virtual-8086 mode interrupt-redirection bitmap. 


IST 
The long-mode interrupt-stack table. 
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IVT 
The real-address mode interrupt-vector table. 


LDT 
Local descriptor table. 


legacy x86 
The legacy x86 architecture. See “Related Documents” on page xxxvi for descriptions of the 
legacy x86 architecture. 


legacy mode 


An operating mode of the AMD64 architecture in which existing 16-bit and 32-bit applications and 
operating systems run without modification. A processor implementation of the AMD64 
architecture can run in either long mode or legacy mode. Legacy mode has three submodes, real 
mode, protected mode, and virtual-8086 mode. 


LIP 
Linear Instruction Pointer. LIP = (CS.base + rIP). 


long mode 


An operating mode unique to the AMD64 architecture. A processor implementation of the 
AMD64 architecture can run in either long mode or legacy mode. Long mode has two submodes, 
64-bit mode and compatibility mode. 


Isb 
Least-significant bit. 


LSB 
Least-significant byte. 


main memory 


Physical memory, such as RAM and ROM (but not cache memory) that is installed in a particular 
computer system. 


mask 
(1) A control bit that prevents the occurrence of a floating-point exception from invoking an 
exception-handling routine. (2) A field of bits used for a control purpose. 

MBZ 
Must be zero. If software attempts to set an MBZ bit to 1, a general-protection exception (#GP) 
occurs. 

memory 
Unless otherwise specified, main memory. 
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ModRM 
A byte following an instruction opcode that specifies address calculation based on mode (Mod), 
register (R), and memory (M) variables. 

moffset 
A 16, 32, or 64-bit offset that specifies a memory operand directly, without using a ModRM or SIB 
byte. 

msb 
Most-significant bit. 


MSB 
Most-significant byte. 


multimedia instructions 
A combination of 128-bit media instructions and 64-bit media instructions. 


octword 
Same as double quadword. 


offset 
Same as displacement. 


overflow 
The condition in which a floating-point number is larger in magnitude than the largest, finite, 
positive or negative number that can be represented in the data-type format being used. 

packed 
See vector. 


PAE 
Physical-address extensions. 


physical memory 
Actual memory, consisting of main memory and cache. 


probe 


A check for an address in a processor’s caches or internal buffers. External probes originate 
outside the processor, and internal probes originate within the processor. 


procedure stack 
A portion of a stack segment in memory that is used to link procedures. Also known as a program 
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stack. 


program stack 
See procedure stack. 


protected mode 
Asubmode of legacy mode. 


quadword 
Four words, or eight bytes, or 64 bits. 


RAZ 


Read as zero. Value returned on a read is always zero (0) regardless of what was previously 
written. See reserved. 


real-address mode 
See real mode. 


real mode 
A short name for real-address mode, a submode of legacy mode. 


relative 


Referencing with a displacement (also called offset) from an instruction pointer rather than the 
base of a code segment. Contrast with absolute. 


reserved 
Fields marked as reserved may be used at some future time. 


To preserve compatibility with future processors, reserved fields require special handling when 
read or written by software. Software must not depend on the state of a reserved field (unless 
qualified as RAZ), nor upon the ability of such fields to return a previously written state. 


If a field is marked reserved without qualification, software must not change the state of that field; 
it must reload that field with the same value returned from a prior read. 


Reserved fields may be qualified as IGN, MBZ, RAZ, or SBZ (see definitions). 


REX 
An instruction prefix that specifies a 64-bit operand size and provides access to additional 
registers. 

RIP-relative addressing 
Addressing relative to the 64-bit RIP instruction pointer. 


SBZ 
Should be zero. An attempt by software to set an SBZ bit to 1 results in undefined behavior. 
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shadow stack 
A shadow stack is a separate, protected stack that is conceptually parallel to the procedure stack 
and used only by the shadow stack feature. 

set 
To write a bit value of 1. Compare clear. 


SIB 


A byte following an instruction opcode that specifies address calculation based on scale (S), index 
(1), and base (B). 


SIMD 
Single instruction, multiple data. See vector. 


SSE 
Streaming SIMD extensions instruction set. See 128-bit media instructions and 64-bit media 
instructions. 

SSE2 
Extensions to the SSE instruction set. See 128-bit media instructions and 64-bit media 
instructions. 

SSE3 
Further extensions to the SSE instruction set. See 128-bit media instructions. 

sticky bit 
A bit that is set or cleared by hardware and that remains in that state until explicitly changed by 
software. 

TOP 
The x87 top-of-stack pointer. 


TPR 
Task-priority register (CR8). 


TS5 
Task-state segment. 


underflow 


The condition in which a floating-point number is smaller in magnitude than the smallest nonzero, 
positive or negative number that can be represented in the data-type format being used. 


vector 
(1) A set of integer or floating-point values, called elements, that are packed into a single operand. 
Most of the 128-bit and 64-bit media instructions use vectors as operands. Vectors are also called 
packed or SIMD (single-instruction multiple-data) operands. 
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(2) An index into an interrupt descriptor table (IDT), used to access exception handlers. Compare 
exception. 


virtual-8086 mode 
Asubmode of legacy mode. 


word 
Two bytes, or 16 bits. 


x86 
See legacy x86. 


Registers 


In the following list of registers, the names are used to refer either to a given register or to the contents 
of that register: 


AH-DH 

The high 8-bit AH, BH, CH, and DH registers. Compare AL—DL. 
AL-—DL 

The low 8-bit AL, BL, CL, and DL registers. Compare AH—DH. 


AL-1r15B 


The low 8-bit AL, BL, CL, DL, SIL, DIL, BPL, SPL, and R8B—R15B registers, available in 64-bit 
mode. 


BP 
Base pointer register. 


CRn 
Control register number n. 


CS 
Code segment register. 


eAX-—eSP 
The 16-bit AX, BX, CX, DX, DI, SI, BP, and SP registers or the 32-bit EAX, EBX, ECX, EDX, 
EDI, ESI, EBP, and ESP registers. Compare rAX—rSP. 

EFER 
Extended features enable register. 


eFLAGS 
16-bit or 32-bit flags register. Compare rFLAGS. 
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EFLAGS 
32-bit (extended) flags register. 


eIP 
16-bit or 32-bit instruction-pointer register. Compare rIP. 


EIP 
32-bit (extended) instruction-pointer register. 


FLAGS 
16-bit flags register. 


GDTR 
Global descriptor table register. 


GPRs 
General-purpose registers. For the 16-bit data size, these are AX, BX, CX, DX, DI, SI, BP, and SP. 
For the 32-bit data size, these are EAX, EBX, ECX, EDX, EDI, ESI, EBP, and ESP. For the 64-bit 
data size, these include RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, and R8-R15. 

IDTR 
Interrupt descriptor table register. 


IP 
16-bit instruction-pointer register. 


LDTR 
Local descriptor table register. 


MSR 
Model-specific register. 


r8—-r15 


The 8-bit R8B—R15B registers, or the 16-bit RBW-—R15W registers, or the 32-bit RED—R15D 
registers, or the 64-bit R8—R15 registers. 


rAX-rSP 


The 16-bit AX, BX, CX, DX, DI, SI, BP, and SP registers, or the 32-bit EAX, EBX, ECX, EDX, 
EDI, ESI, EBP, and ESP registers, or the 64-bit RAX, RBX, RCX, RDX, RDI, RSI, RBP, and RSP 
registers. Replace the placeholder r with nothing for 16-bit size, “E” for 32-bit size, or “R” for 64- 
bit size. 


RAX 
64-bit version of the EAX register. 
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RBP 
64-bit version of the EBP register. 


RBX 
64-bit version of the EBX register. 


RCX 
64-bit version of the ECX register. 


RDI 
64-bit version of the EDI register. 


RDX 
64-bit version of the EDX register. 


rFLAGS 
16-bit, 32-bit, or 64-bit flags register. Compare RFLAGS. 


RFLAGS 
64-bit flags register. Compare rFLAGS. 


rIP 
16-bit, 32-bit, or 64-bit instruction-pointer register. Compare RIP. 


RIP 
64-bit instruction-pointer register. 


RSI 
64-bit version of the ESI register. 


RSP 
64-bit version of the ESP register. 


SP 
Stack pointer register. 


SS 
Stack segment register. 


SSP 
Shadow-stack pointer register. 


TPR 


Task priority register, a new register introduced in the AMD64 architecture to speed interrupt 
management. 
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TR 
Task register. 


Endian Order 


The x86 and AMD64 architectures address memory using little-endian byte-ordering. Multibyte 
values are stored with their least-significant byte at the lowest byte address, and they are illustrated 
with their least significant byte at the right side. Strings are illustrated in reverse order, because the 
addresses of their bytes increase from right to left. 
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1 Instruction Encoding 


AMD64 technology instructions are encoded as byte strings of variable length. The order and meaning 
of each byte of an instruction’s encoding is specified by the architecture. Fields within the encoding 
specify the instruction’s basic operation, the location of the one or more source operands, and the 
destination of the result of the operation. Data to be used in the execution of the instruction or the 
computation of addresses for memory-based operands may also be included. This section describes the 
general format and parameters used by all instructions. 

For information on the specific encoding(s) for each instruction, see: 

¢ Chapter 3, “General-Purpose Instruction Reference.” 

¢ Chapter 4, “System Instruction Reference.” 

e “SSE Instruction Reference” in Volume 4. 

e “64-Bit Media Instruction Reference” in Volume 5. 

e “x87 Floating-Point Instruction Reference” in Volume 5. 


For information on determining the instruction form and operands specified by a given binary 
encoding, see Appendix A. 


1.1 Instruction Encoding Overview 


An instruction is encoded as a string between one and 15 bytes in length. The entire sequence of bytes 
that represents an instruction, including the basic operation, the location of source and destination 
operands, any operation modifiers, and any immediate and/or displacement values, is called the 
instruction encoding.The following sections discuss instruction encoding syntax and representation in 
memory. 


1.1.1 Encoding Syntax 


Figure 1-1 provides a schematic representation of the encoding syntax of an instruction. 
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Figure 1-1. 


C4 3-byte sequence 


map=08h 


map=0Ah 


VEX 
opcode 
map 3 


XOP 

opcode 

map 8 
map=09h 
XOP 


opcode 
map 9 


XOP 
opcode 
mapA 


. map = VEX/XOP.map_select field 
. The total number of bytes in an 


instruction encoding must be less than 
or equal to 15 


. Instructions that encode an 8-byte 


immediate field do not use a displace- 
ment field and vice versa. 


Instruction Encoding Syntax 


Each square in this diagram represents an instruction byte of a particular type and function. To 
understand the diagram, follow the connecting paths in the direction indicated by the arrows from 
“Start” to “End.” The squares passed through as the graph is traversed indicate the order and number of 
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bytes used to encode the instruction. Note that the path shown above the legacy prefix byte loops back 
indicating that up to four additional prefix bytes may be used in the encoding of a single instruction. 
Branches indicate points in the syntax where alternate semantics are employed based on the instruction 
being encoded. The “VEX or XOP” gate across the path leading down to the VEX prefix and XOP 
prefix blocks means that only extended instructions employing the VEX or XOP prefixes use this 
particular branch of the syntax diagram. This diagram will be further explained in the sections that 
follow. 


1.1.1.1 Legacy Prefixes 


As shown in the figure, an instruction optionally begins with up to five legacy prefixes. These prefixes 
are described in “Summary of Legacy Prefixes” on page 6. The legacy prefixes modify an instruction’s 
default address size, operand size, or segment, or they invoke a special function such as modification 
of the opcode, atomic bus-locking, or repetition. 


In the encoding of most SSE instructions, a legacy operand-size or repeat prefix is repurposed to 
modify the opcode. For the extended encodings utilizing the XOP or VEX prefixes, these prefixes are 
not allowed. 


1.1.1.2 REX Prefix 


Following the optional legacy prefix or prefixes, the REX prefix can be used in 64-bit mode to access 
the AMD64 register number and size extensions. Refer to the diagram in “Application-Programming 
Register Set” in Volume 1 for an illustration of these facilities. If a REX prefix is used, it must 
immediately precede the opcode byte or the first byte of a legacy escape sequence. The REX prefix is 
not allowed in extended instruction encodings using the VEX or XOP encoding escape prefixes. 
Violating this restriction results in an #UD exception. 


1.1.1.3 Opcode 


The opcode is a single byte that specifies the basic operation of an instruction. Every instruction 
requires an opcode. The correspondence between the binary value of an opcode and the operation it 
represents is presented in a table called an opcode map. Because it is indexed by an 8-bit value, an 
opcode map has 256 entries. Since there are more than 256 instructions defined by the architecture, 
multiple different opcode maps must be defined and the selection of these alternate opcode maps must 
be encoded in the instruction. Escape sequences provide this access to alternate opcode maps. 


If there are no opcode escapes, the primary (“one-byte”) opcode map is used. In the figure this is the 
path pointing from the REX Prefix block to the Primary opcode map block. 


Section , “Primary Opcode Map” of Appendix A provides details concerning this opcode map. 


1.1.1.4 Escape Sequences 


Escape sequences allow access to alternate opcode maps that are distinct from the primary opcode 
map. Escape sequences may be one, two, or three bytes in length and begin with a unique byte value 
designated for this purpose in the primary opcode map. Escape sequences are of two distinct types: 
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legacy escape sequences and extended escape sequences. The legacy escape sequences will be covered 
here. For more details on the extended escape sequences, see “VEX and XOP Prefixes” on page 16. 


Legacy Escape Sequences 


The legacy syntax allows one 1-byte escape sequence (OFh), and three 2-byte escape sequences (OFh, 
OFh; OFh, 38h; and OFh, 3Ah). The 1-byte legacy escape sequence OFh selects the secondary (“two- 
byte”) opcode map. In legacy terminology, the sequence [OFh, opcode] is called a two-byte opcode. 
See Section , “Secondary Opcode Map” of Appendix A for details concerning this opcode map. 


The 2-byte escape sequence OF, OFh selects the 3DNow! opcode map which is indexed using an 
immediate byte rather than an opcode byte. In this case, the byte following the escape sequence is the 
ModRM byte instead of the opcode byte. In Figure 1-1 this is indicated by the path labeled “3DNow!” 
leaving the second OFh escape block. Details concerning the 3DNow! opcode map are presented in 
Section A.1.2, “3DNow!™ Opcodes” of Appendix A. 


The 2-byte escape sequences [OFh, 38h] and [OFh, 3Ah] respectively select the OF_38h opcode map 
and the OF_3Ah opcode map. These are used primarily to encode SSE instructions and are described in 
Section , “OF_38h and OF_3Ah Opcode Maps” of Appendix A. 


1.1.1.5 ModRM and SIB Bytes 


The opcode can be followed by a mode-register-memory (ModRM) byte, which further describes the 
operation and/or operands. The ModRM byte may also be followed by a scale-index-base (SIB) byte, 
which is used to specify indexed register-indirect forms of memory addressing. The ModRM and SIB 
bytes are described in “ModRM Byte Format” on page 17. Their legacy functions can be augmented 
by the REX prefix (see “REX Prefix” on page 14) or the VEX and XOP escape sequences (See “VEX 
and XOP Prefixes” on page 16). 


1.1.1.6 Displacement and Immediate Fields 


The instruction encoding may end with a 1-, 2-, or 4-byte displacement field and/or a 1-, 2-, or 4-byte 
immediate field depending on the instruction and/or the addressing mode. Specific instructions also 
allow either an 8-byte immediate field or an 8-byte displacement field. 


1.1.2 Representation in Memory 


Instructions are stored in memory in little-endian order. The first byte of an instruction is stored at the 
lowest memory address, as shown in Figure 1-2 below. Since instructions are strings of bytes, they 
may start at any memory address. The total instruction length must be less than or equal to 15. If this 
limit is exceeded, a general-protection exception results. 
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Figure 1-2. An Instruction as Stored in Memory 


1.2 Instruction Prefixes 


Instruction prefixes are of two types: instruction modifier prefixes and encoding escape prefixes. 
Instruction modifier prefixes can change the operation of the instruction (including causing its 
execution to repeat), change its operand types, specify an alternate operand size, augment register 
specification, or even change the interpretation of the opcode byte. 


The instruction modifier prefixes comprise the legacy prefixes and the REX prefix. The legacy 
prefixes are discussed in the next section. The REX prefix is discussed in “REX Prefix” on page 14. 


Encoding escape prefixes, on the other hand, signal that the two or three bytes that follow obey a 
different encoding syntax. As a group, the encoding escape prefix and its subsequent bytes constitute a 
multi-byte escape sequence. These multi-byte escape sequences perform functions similar to that of 
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the instruction modifier prefixes, but they also provide a means to directly specify alternate opcode 
maps. 


The currently defined encoding escape prefixes are the VEX and XOP prefixes. They are discussed 
further in the section entitled “VEX and XOP Prefixes” on page 16. 


1.2.1 Summary of Legacy Prefixes 


Table 1-1 on page 7 shows the legacy prefixes. The legacy prefixes are organized into five groups, as 
shown in the left-most column of Table 1-1. An instruction encoding may include a maximum of one 
prefix from each of the five groups. The legacy prefixes can appear in any order within the position 
shown in Figure 1-1 for legacy prefixes. The result of using multiple prefixes from a single group is 
undefined. 


Some of the restrictions on legacy prefixes are: 


¢ Operand-Size Override—This prefix only affects the operand size for general-purpose instructions 
or for other instructions whose source or destination is a general-pupose register. When used in the 
encoding of SIMD and some other instructions, this prefix is repurposed to modify the opcode. 


e Address-Size Override—This prefix only affects the address size of memory operands. 


e¢ Segment Override—In 64-bit mode, the CS, DS, ES, and SS segment override prefixes are 
ignored. 


¢ LOCK Prefix—This prefix is allowed only with certain instructions that modify memory. 
¢ Repeat Prefixes—These prefixes affect only certain string instructions. When used in the encoding 
of SIMD and some other instructions, these prefixes are repurposed to modify the opcode. 


Note that Lock and Repeat prefixes are in effect mutually exclusive when used as instruction 
modifiers, in that there are no instructions for which both are meaningful. 
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Table 1-1. Legacy Instruction Prefixes 


Prefix G 1 Mnemonic Pretx Description 
refix Group Byte (Hex) p 
Operand-Size ete 2 Changes the default operand size of a memory or 
Override 66 register operand, as shown in Table 1-2 on page 8. 
AddeeeSize Guenide none 673 Changes the default address size of a memory operand, 
as shown in Table 1-3 on page 9. 
cs oE4 Forces use of the current CS segment for memory 
operands. 
Ds 364 Forces use of the current DS segment for memory 
operands. 
ES 264 Forces use of the current ES segment for memory 
; operands. 
Segment Override 
Forces use of the current FS segment for memory 
FS 64 
operands. 
Forces use of the current GS segment for memory 
GS 65 
operands. 
ss 364 Forces use of the current SS segment for memory 
operands. 
aa LOCK Fo08 Causes certain kinds of memory read-modify-write 
instructions to occur atomically. 
REP Repeats a string operation (INS, MOVS, OUTS, LODS, 
and STOS) until the rCX register equals 0. 
REPE or F36 Repeats a compare-string or scan-string operation 
Repeat REPZ (CMPSx and SCASx) until the rCX register equals O or 
P the zero flag (ZF) is cleared to 0. 
REPNE or Repeats a compare-string or scan-string operation 
F26 (CMPSx and SCASx) until the rCX register equals O or 
REPNZ ; 
the zero flag (ZF) is set to 1. 


Notes: 
1. Asingle instruction should include no more than one prefix from each of the Override prefix groups plus either a 
Lock or Repeat prefix, when used as instruction modifiers. 


2. When used in the encoding of SIMD and some other instructions, this prefix is repurposed to extend the opcode. 
The prefix is ignored by 64-bit media floating-point (3DNow!™) instructions. See “Instructions that Cannot Use the 
Operand-Size Prefix” on page 8. 


This prefix also changes the size of the RCX register when used as an implied count register. 
In 64-bit mode, the CS, DS, ES, and SS segment overrides are ignored. 
The LOCK prefix should not be used for instructions other than those listed in “Lock Prefix” on page 11. 


This prefix should be used only with compare-string and scan-string instructions. When used in the encoding of 
SIMD and some other instructions, the prefix is repurposed to extend the opcode. 


AaRw 


1.2.2 Operand-Size Override Prefix 


The default operand size for an instruction is determined by a combination of its opcode, the D 
(default) bit in the current code-segment descriptor, and the current operating mode, as shown in 
Table 1-2. The operand-size override prefix (66h) selects the non-default operand size. The prefix can 
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be used with any general-purpose instruction that accesses non-fixed-size operands in memory or 
general-purpose registers (GPRs), and it can also be used with the x87 FLDENV, FNSTENV, 
FNSAVE, and FRSTOR instructions. 


In 64-bit mode, the prefix allows mixing of 16-bit, 32-bit, and 64-bit data on an instruction-by- 
instruction basis. In compatibility and legacy modes, the prefix allows mixing of 16-bit and 32-bit 
operands on an instruction-by-instruction basis. 


Table 1-2. Operand-Size Overrides 


Default Effective Instruction Prefix? 
Operating Mode Operand i allie - 
i i 66h REX.W 
Size (Bits) (Bits) 
64 don’t care yes 
64-Bit 2 
Mode 32 32 no no 
16 yes no 
Long 
Mode oe 32 no 
Compatibility 16 yes 
Mode 32 yes 
16 
16 no Not Appili- 
32 no cable 
Legacy Mode 32 16 yes 
(Protected, Virtual-8086, 
or Real Mode) 16 32 yes 
16 no 
Notes: 
1. A ‘no’ indicates that the default operand size is used. 
2. This is the typical default, although some instructions default to other operand 
sizes. See Appendix B, “General-Purpose Instructions in 64-Bit Mode,” for details. 
3. See “REX Prefix” on page 14. 


In 64-bit mode, most instructions default to a 32-bit operand size. For these instructions, a REX prefix 
(page 14) can specify a 64-bit operand size, and a 66h prefix specifies a 16-bit operand size. The REX 
prefix takes precedence over the 66h prefix. However, if an instruction defaults to a 64-bit operand 
size, it does not need a REX prefix and it can only be overridden to a 16-bit operand size. It cannot be 
overridden to a 32-bit operand size, because there is no 32-bit operand-size override prefix in 64-bit 
mode. Two groups of instructions have a default 64-bit operand size in 64-bit mode: 


e¢ Near branches. For details, see “Near Branches in 64-Bit Mode” in Volume 1. 


e All instructions, except far branches, that implicitly reference the RSP. For details, see “Stack 
Operation” in Volume 1. 


Instructions that Cannot Use the Operand-Size Prefix. The operand-size prefix should be used 
only with general-purpose instructions and the x87 FLDENV, FNSTENV, FNSAVE, and FRSTOR 
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instructions, in which the prefix selects between 16-bit and 32-bit operand size. The prefix is ignored 
by all other x87 instructions and by 64-bit media floating-point (3DNow!™) instructions. 


For other instructions (mostly SIMD instructions) the 66h, F2h, and F3h prefixes are used as opcode 
extensions to extend the instruction encoding space in the OFh, OF_38h, and OF_3Ah opcode maps. 


Operand-Size and REX Prefixes. The W bit field of the REX prefix takes precedence over the 66h 
prefix. See “REX.W: Operand width (Bit 3)” on page 23 for details. 


1.2.3. Address-Size Override Prefix 


The default address size for instructions that access non-stack memory is determined by the current 
operating mode, as shown in Table 1-3. The address-size override prefix (67h) selects the non-default 
address size. Depending on the operating mode, this prefix allows mixing of 16-bit and 32-bit, or of 
32-bit and 64-bit addresses, on an instruction-by-instruction basis. The prefix changes the address size 
for memory operands. It also changes the size of the RCX register for instructions that use RCX 
implicitly. 


For instructions that implicitly access the stack segment (SS), the address size for stack accesses is 
determined by the D (default) bit in the stack-segment descriptor. In 64-bit mode, the D bit is ignored, 
and all stack references have a 64-bit address size. However, if an instruction accesses both stack and 
non-stack memory, the address size of the non-stack access is determined as shown in Table 1-3. 


Table 1-3. Address-Size Overrides 


: Address- 
Default Effective Size Prefix 
Operating Mode Address _| Address Size 1 
Size (Bits) (Bits) (67h) 
Required? 
-Bi 64 no 
64-Bit 6A 
Mode 32 yes 
32 no 
Long Mode 32 
Compatibility 16 yes 
Mode 32 yes 
16 
16 no 
a 32 no 
Legacy Mode 16 yes 
(Protected, Virtual-8086, or Real a 
es 
Mode) 16 ¥ 
16 no 
Notes: 
1. A “no” indicates that the default address size is used. 


As Table 1-3 shows, the default address size is 64 bits in 64-bit mode. The size can be overridden to 32 
bits, but 16-bit addresses are not supported in 64-bit mode. In compatibility and legacy modes, the 
default address size is 16 bits or 32 bits, depending on the operating mode (see “Processor 
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Initialization and Long Mode Activation” in Volume 2 for details). In these modes, the address-size 
prefix selects the non-default size, but the 64-bit address size is not available. 


Certain instructions reference pointer registers or count registers implicitly, rather than explicitly. In 
such instructions, the address-size prefix affects the size of such addressing and count registers, just as 
it does when such registers are explicitly referenced. Table 1-4 lists all such instructions and the 
registers referenced using the three possible address sizes. 


Table 1-4. Pointer and Count Registers and the Address-Size Prefix 


Pointer or Count Register 
Instruction 16-Bit 32-Bit 64-Bit 
Address Size | Address Size | Address Size 

CMPS, CMPSB, CMPSW, 
CMPSD, CMPSQ—Compare SI, DI, CX ESI, EDI, ECX | RSI, RDI, RCX 
Strings 
INS INSE INS) NSD — DI, CX EDI, ECX RDI, RCX 
Input String 
JCXZ, JECXZ, JRCXZ— 
Jump on CX/ECX/RCX Zero os ECA REX 
LODS, LODSB, LODSW, 
LODSD, LODSQ—Load SI, CX ESI, ECX RSI, RCX 
String 
LOOP, LOOPE, LOOPNZ, 
LOOPNE, LOOPZ—Loop cx eos ROM 
MOVS, MOVSB, MOVSW, 
MOVSD, MOVSQ—Move Sl, DI, CX ESI, EDI, ECX | RSI, RDI, RCX 
String 
OUTS, OUTSB, OUTSW, 
OUTSD—Output String SI, CX ESI, ECX RSI, RCX 
REP, REPE, REPNE, REPNZ, 
REPZ—Repeat Prefixes = eas Ron 
SCAS, SCASB, SCASW, 
SCASD, SCASQ—Scan DI, CX EDI, ECX RDI, RCX 
String 
STOS, STOSB, STOSW, 
STOSD, STOSQ—Store DI, CX EDI, ECX RDI, RCX 
String 
XLAT, XLATB—Table Look-up BX EBX RBX 
Translation 


1.2.4 Segment-Override Prefixes 


Segment overrides can be used only with instructions that reference non-stack memory. Most 
instructions that reference memory are encoded with a ModRM byte (page 17). The default segment 
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for such memory-referencing instructions is implied by the base register indicated in its ModRM byte, 
as follows: 


¢ Instructions that Reference a Non-Stack Segment—If an instruction encoding references any base 
register other than rBP or rSP, or if an instruction contains an immediate offset, the default segment 
is the data segment (DS). These instructions can use the segment-override prefix to select one of 
the non-default segments, as shown in Table 1-5. 


¢ String Instructions—String instructions reference two memory operands. By default, they 
reference both the DS and ES segments (DS:rSI and ES:rDI). These instructions can override their 
DS-segment reference, as shown in Table 1-5, but they cannot override their ES-segment 
reference. 


¢ Instructions that Reference the Stack Segment—lIf an instruction’s encoding references the rBP or 
rSP base register, the default segment is the stack segment (SS). All instructions that reference the 
stack (push, pop, call, interrupt, return from interrupt) use SS by default. These instructions cannot 
use the segment-override prefix. 


Table 1-5. Segment-Override Prefixes 


Mnemonic pas Description 
cs! 2E Forces use of current CS segment for memory operands. 
ps? 3E Forces use of current DS segment for memory operands. 
Es! 26 Forces use of current ES segment for memory operands. 
FS 64 Forces use of current FS segment for memory operands. 
GS 65 Forces use of current GS segment for memory operands. 
ss! 36 Forces use of current SS segment for memory operands. 
Notes: 
1. In 64-bit mode, the CS, DS, ES, and SS segment overrides are ignored. 


Segment Overrides in 64-Bit Mode. In 64-bit mode, the CS, DS, ES, and SS segment-override 
prefixes have no effect. These four prefixes are not treated as segment-override prefixes for the 
purposes of multiple-prefix rules. Instead, they are treated as null prefixes. 


The FS and GS segment-override prefixes are treated as true segment-override prefixes in 64-bit 
mode. Use of the FS or GS prefix causes their respective segment bases to be added to the effective 
address calculation. See “FS and GS Registers in 64-Bit Mode” in Volume 2 for details. 


1.2.5 Lock Prefix 


The LOCK prefix causes certain kinds of memory read-modify-write instructions to occur atomically. 
The mechanism for doing so is implementation-dependent (for example, the mechanism may involve 
bus signaling or packet messaging between the processor and a memory controller). The prefix is 
intended to give the processor exclusive use of shared memory in a multiprocessor system. 


Instruction Encoding AM D Publi c Us e] 11 


AMD¢\ 
AMD64 Technology 24594—Rev. 3.33—November 2021 


The LOCK prefix can only be used with forms of the following instructions that write a memory 
operand: ADC, ADD, AND, BTC, BTR, BTS, CMPXCHG, CMPXCHG8B, CMPXCHG16B, DEC, 
INC, NEG, NOT, OR, SBB, SUB, XADD, XCHG, and XOR. An invalid-opcode exception occurs if 
the LOCK prefix is used with any other instruction. 


1.2.6 Repeat Prefixes 


The repeat prefixes cause repetition of certain instructions that load, store, move, input, or output 
strings. The prefixes should only be used with such string instructions. Two pairs of repeat prefixes, 
REPE/REPZ and REPNE/REPNZ, perform the same repeat functions for certain compare-string and 
scan-string instructions. The repeat function uses rCX as a count register. The size of rCX is based on 
address size, as shown in Table 1-4 on page 10. 


REP. The REP prefix repeats its associated string instruction the number of times specified in the 
counter register (rCX). It terminates the repetition when the value in rCX reaches 0. The prefix can be 
used with the INS, LODS, MOVS, OUTS, and STOS instructions. Table 1-6 shows the valid REP 
prefix opcodes. 


Table 1-6. REP Prefix Opcodes 


Mnemonic Opcode 
REP INS reg/mem8, DX 
REP INSB 

REP INS reg/mem16/32, DX 
REP INSW F3 6D 
REP INSD 

REP LODS memé 

REP LODSB 

REP LODS mem16/32/64 

REP LODSW 

REP LODSD 

REP LODSQ 

REP MOVS mem8, mem8& 

REP MOVSB 

REP MOVS mem16/32/64, mem16/32/64 
REP MOVSW 

REP MOVSD 

REP MOVSQ 

REP OUTS Dx, reg/mem8& 

REP OUTSB 


F3 6C 


F3 AC 


F3 AD 


F3 A4 


F3 AS 


F3 6E 


12 AM D Publi c Us @] Instruction Encoding 


AMD¢1 
24594—Rev. 3.33—November 2021 AMD64 Technology 


Table 1-6. REP Prefix Opcodes (continued) 


Mnemonic Opcode 
REP OUTS Dx, reg/mem16/32 
REP OUTSW F3 6F 
REP OUTSD 

REP STOS mem8 

REP STOSB 

REP STOS mem16/32/64 
REP STOSW 

REP STOSD 

REP STOSQ 


F3 AA 


F3 AB 


REPE and REPZ. REPE and REPZ are synonyms and have identical opcodes. These prefixes repeat 
their associated string instruction the number of times specified in the counter register (rCX). The 
repetition terminates when the value in rCX reaches 0 or when the zero flag (ZF) is cleared to 0. The 
REPE and REPZ prefixes can be used with the CMPS, CMPSB, CMPSD, CMPSW, SCAS, SCASB, 
SCASD, and SCASW instructions. Table 1-7 shows the valid REPE and REPZ prefix opcodes. 


Table 1-7. REPE and REPZ Prefix Opcodes 


Mnemonic Opcode 
REPx CMPS mem8&, mem8& 

REPx CMPSB 

REPx CMPS mem16/32/64, mem16/32/64 
REPx CMPSW 

REPx CMPSD 

REPx CMPSQ 

REPx SCAS mem8& 

REPx SCASB 

REPx SCAS mem16/32/64 

REPx SCASW 

REPx SCASD 

REPx SCASQ 


F3 A6 


F3 A7 


F3 AE 


F3 AF 


REPNE and REPNZ. REPNE and REPNZ are synonyms and have identical opcodes. These prefixes 
repeat their associated string instruction the number of times specified in the counter register (rCX). 
The repetition terminates when the value in rCX reaches 0 or when the zero flag (ZF) is set to 1. The 
REPNE and REPNZ prefixes can be used with the CMPS, CMPSB, CMPSD, CMPSW, SCAS, 
SCASB, SCASD, and SCASW instructions. Table 1-8 on page 14 shows the valid REPNE and 
REPNZ prefix opcodes. 
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Table 1-8. REPNE and REPNZ Prefix Opcodes 


Mnemonic Opcode 
REPNx CMPS mem8, mem8& 

REPNx CMPSB 

REPNx CMPS mem16/32/64, mem16/32/64 
REPNx CMPSW 

REPNx CMPSD 

REPNx CMPSQ 

REPNx SCAS mem8& 

REPNx SCASB 
REPNx SCAS mem16/32/64 

REPNx SCASW 

REPNx SCASD 

REPNx SCASQ 


F2 A6 


F2 A7 


F2 AE 


F2 AF 


Instructions that Cannot Use Repeat Prefixes. In general, the repeat prefixes should only be used 
in the string instructions listed in tables 1-6, 1-7, and 1-8 above. For other instructions (mostly SIMD 
instructions) the 66h, F2h, and F3h prefixes are used as instruction modifiers to extend the instruction 
encoding space in the OFh, OF_38h, and OF_3Ah opcode maps. 


Optimization of Repeats. Depending on the hardware implementation, the repeat prefixes can have 
a setup overhead. If the repeated count is variable, the overhead can sometimes be avoided by 
substituting a simple loop to move or store the data. Repeated string instructions can be expanded into 
equivalent sequences of inline loads and stores or a sequence of stores can be used to emulate a REP 
STOS. 


For repeated string moves, performance can be maximized by moving the largest possible operand 
size. For example, use REP MOVSD rather than REP MOVSW and REP MOVSW rather than REP 
MOVSB. Use REP STOSD rather than REP STOSW and REP STOSW rather than REP MOVSB. 


Depending on the hardware implementation, string moves with the direction flag (DF) cleared to 0 
(up) may be faster than string moves with DF set to 1 (down). DF = 1 is only needed for certain cases 
of overlapping REP MOVS, such as when the source and the destination overlap. 


1.2.7. REX Prefix 


The REX prefix, available in 64-bit mode, enables use of the AMD64 register and operand size 
extensions. Unlike the legacy instruction modification prefixes, REX is not a single unique value, but 
occupies a range (40h to 4Fh). Figure 1-1 on page 2 shows how the REX prefix fits within the 
encoding syntax of instructions. 


The REX prefix enables the following features in 64-bit mode: 


¢ Use of the extended GPR (Figure 2-3 on page 39) and YMM/XMM registers (Figure 2-8 on 
page 44). 
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¢ Use of the 64-bit operand size when accessing GPRs. 

¢ Use of the extended control and debug registers, as described in Section 2.4 “Registers” in 
Volume 2. 

¢ Use of the uniform byte registers (AL—R15). 


REX contains five fields. The upper nibble is unique to the REX prefix and identifies it is as such. The 
lower nibble is divided into four 1-bit fields (W, R, X, and B). See below for a discussion of these 
fields.Figure 1-3 below shows the format of the REX prefix. Since each bit of the lower nibble can be 
a1 ora0, REX spans one full row of the primary opcode map occupying entries 40h through 4Fh. 


7 6 5 4 3 2 1 0 
4 WIR XBT 
v3_REX_byte_format.eps 


Figure 1-3. REX Prefix Format 


A REX prefix is normally required with an instruction that accesses a 64-bit GPR or one of the 
extended GPR or YMM/XMM registers. A few instructions have an operand size that defaults to (or is 
fixed at) 64 bits in 64-bit mode, and thus do not need a REX prefix. These instructions are listed in 
Table 1-9 below. 


Table 1-9. Instructions Not Requiring REX Prefix in 64-Bit Mode 


CALL (Near) POP reg/mem 

ENTER POP reg 

Jcc POP FS 

JrCxZ POP GS 

JMP (Near) POPF, POPFD, POPFQ 
LEAVE PUSH imm8 

LGDT PUSH imm32 

LIDT PUSH reg/mem 

LLDT PUSH reg 

LOOP PUSH FS 

LOOPcc PUSH GS 

LTR PUSHF, PUSHFD, PUSHFQ 
MOV CRn RET (Near) 

MOV DRn 


An instruction may have only one REX prefix which must immediately precede the opcode or first 
escape byte in the instruction encoding. The use of a REX prefix in an instruction that does not access 
an extended register is ignored. The instruction-size limit of 15 bytes applies to instructions that 
contain a REX prefix. 
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Implications for INC and DEC Instructions 


The REX prefix values are taken from the 16 single-byte INC and DEC instructions, one for each of 
the eight legacy GPRs. Therefore, these single-byte opcodes for INC and DEC are not available in 64- 
bit mode, although they are available in legacy and compatibility modes. The functionality of these 
INC and DEC instructions is still available in 64-bit mode, however, using the ModRM forms of those 
instructions (opcodes FF /0 and FF /1). 


1.2.8. VEX and XOP Prefixes 


The extended instruction encoding syntax, available in protected and long modes, provides one 2-byte 
and three 3-byte escape sequences introduced by either the VEX or XOP prefixes. These multi-byte 
sequences not only select opcode maps, they also provide instruction modifiers similar to, but in lieu 
of, the REX prefix. 


The 2-byte escape sequence initiated by the VEX C5h prefix implies a map_select encoding of 1. The 
three-byte escape sequences, initiated by the VEX C4h prefix or the XOP (8Fh) prefix, select the target 
opcode map explicitly via the VEX/XOP.map_select field. The five-bit VEX.map_select field allows 
the selection of one of 31 different opcode maps (opcode map OOh is reserved). The XOP.map_select 
field is restricted to the range 08h — 1Fh and thus can only select one of 24 different opcode maps. 


The VEX and XOP escape sequences contain fields that extend register addressing to a total of 16, 
increase the operand specification capability to four operands, and modify the instruction operation. 


The extended SSE instruction subsets AVX, AES, CLMU, FMA, FMA4, and XOP and a few non-SSE 
instructions utilize the extended encoding syntax. See “Encoding Using the VEX and XOP Prefixes” 
on page 29 for details on the encoding of the two- and three-byte extended escape sequences. 


1.3 Opcode 


The opcode is a single byte that specifies the basic operation of an instruction. In some cases, it also 
specifies the operands for the instruction. Every instruction requires an opcode. The correspondence 
between the binary value of the opcode and the operation it represents is defined by a table called an 
opcode map. As discussed in the previous sections, the legacy prefixes 66h, F2h, and F3h and other 
fields within the instruction encoding may be used to modify the operation encoded by the opcode. 


The affect of the presence of a 66h, F2h, or F3h prefix on the operation performed by the opcode is 
represented in the opcode map by additional rows in the table indexed by the applicable prefix. The 3- 
bit reg and r/m fields of the ModRM byte (“ModRM Byte Format” on page 17 and “SIB Byte Format” 
on page 18) are used as well in the encoding of certain instructions. This is represented in the opcode 
maps via instruction group tables that detail the modifications represented via the extra encoding bits. 
See Section A.1, “Opcode Maps” of Appendix A for examples. 


Even though each instruction has a unique opcode map and opcode, assemblers often support multiple 
alternate mnemonics for the same instruction to improve the readability of assembly language code. 
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The 64-bit floating-point 3DNow! instructions utilize the two-byte escape sequence OFh, OFh to select 
the 3DNow! opcode map. For these instructions the opcode is encoded in the immediate field at the 
end of the instruction encoding. 


For details on how the opcode byte encodes the basic operation for specifc instructions, see Section 
A.1, “Opcode Maps” of Appendix A 


1.4 ModRM and SIB Bytes 

The ModRM byte is optional depending on the instruction. When present, it follows the opcode and is 
used to specify: 

¢ two register-based operands, or 


* one register-based operand and a second memory-based operand and an addressing mode. 


In the encoding of some instructions, fields within the ModRM byte are repurposed to provide 
additional opcode bits used to define the instruction’s function. 


The ModRM byte is partitioned into three fields—mod, reg, and r/m. Normally the reg field specifies a 
register-based operand and the mod and r/m fields used together specify a second operand that is either 
register-based or memory-based. The addressing mode is also specified when the operand is memory- 
based. 


In 64-bit mode, the REX.R and REX.B bits augment the reg and r/m fields respectively allowing the 
specification of twice the number of registers. 


1.4.1 ModRM Byte Format 


Figure 1-4 below shows the format of a ModRM byte. 


7 6 5 4 3 2 1 0 


reg r/m ModRM 


REX.R, VEX.R or XOP.R _t 
extend this field to 4 bits 


REX.B, VEX.B, or XOP.B 
extend this field to 4 bits 


Figure 1-4. ModRM-Byte Format 
Depending on the addressing mode, the SIB byte may appear after the ModRM byte. SIB is used in the 


specification of various forms of indexed register-indirect addressing. See the following section for 
details. 
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ModRM.mod (Bits[7:6]). The mod field is used with the r/m field to specify the addressing mode for 
an operand. ModRM.mod = 11b specifies the register-direct addressing mode. In the register-direct 
mode, the operand is held in the specified register. ModRM.mod values less than 11b specify register- 
indirect addressing modes. In register-indirect addressing modes, values held in registers along with an 
optional displacement specified in the instruction encoding are used to calculate the address of a 
memory-based operand. Other encodings of the 5 bits {mod, r/m} are discussed below. 


ModRM.reg (Bits[5:3]). The reg field is used to specify a register-based operand, although for some 
instructions, this field is used to extend the operation encoding. The encodings for this field are shown 
in Table 1-10 below. 


ModRM.r/m (Bits[2:0]). As stated above, the r/m field is used in combination with the mod field to 
encode 32 different operand specifications (See Table 1-14 on page 21). The encodings for this field 
are shown in Table 1-10 below. 


Table 1-10. ModRM.reg and .r/m Field Encodings 


Encoded value 1 _ 1 ModRM.r/m 
(binary) ModRM.reg ModRM.r/m (mod = 11b) (mod # 1b)? 
000 rAX, MMX0O, XMMO, YMMO rAX, MMXO, XMMO, YMMO [rAX] 
001 rCX, MMX1, XMM1, YMM1 rCX, MMX1, XMM1, YMM1 [rCX] 
010 rDX, MMX2, XMM2, YMM2 rDX, MMX2, XMM2, YMM2 [rDX] 
011 rBX, MMX3, XMM3, YMM3 rBX, MMX3, XMM3, YMM3 [rBX] 
100 AH, rSP, MMX4, XMM4, YMM4_ | AH, rSP, MMX4, XMM4, YMM4 siB3 
101 CH, rBP, MMX5, XMM5, YMM5_ S| CH, rBP, MMX5, XMM5, YMM5 [rBP]* 
110 DH, rSl, MMX6, XMM6, YMM6__| DH, rSI, MMX6, XMM6, YMM6 [rSl] 
111 BH, rDI, MMX7, XMM7, YMM7_ | BH, rDI, MMX7, XMM7, YMM7 [rDI] 


Notes: 
1. Specific register used is instruction-dependent. 
2. mod = 01 and mod = 10 include an offset specified by the instruction displacement field. 
The notation [*] signifies that the specified register holds the address of the operand. 
3. Indexed register-indirect addressing. SIB byte follows ModRM byte. See following section for SIB encoding. 


4. For mod = 00b ,, r/m = 101b signifies absolute (displacement-only) addressing in 32-bit mode or RIP-relative 
addressing in 64-bit mode, where the rBP register is not used. For mod = [01b, 10b], r/m = 101b specifies 
the base + offset addressing mode with [rBP] as the base. 


Similar to the reg field, r/m is used in some instructions to extend the operation encoding. 


1.4.2  SIB Byte Format 


The SIB byte has three fields—scale, index, and base—that define the scale factor, index-register 
number, and base-register number for the 32-bit and 64-bit indexed register-indirect addressing 
modes. 
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The basic formula for computing the effective address of a memory-based operand using the indexed 
register-indirect address modes is: 


effective_address = scale * index + base + offset 
Specific variants of this addressing mode set one or more elements of the sum to zero. 


Figure 1-5 below shows the format of the SIB byte. 


Bits: 7 6 5 4 3 2 1 0 


SB 


REX.X bit of REX prefix can t 
extend this field to 4 bits 


REX.B bit of REX prefix can 
extend this field to 4 bits 


Figure 1-5. SIB Byte Format 


SIB.scale (Bits[7:6]). The scale field is used to specify the scale factor used in computing the 
scale*index portion of the effective address. In normal usage scale represents the size of data elements 
in an array expressed in number of bytes. SIB.scale is encoded as shown in Table 1-11 below. 


Table 1-11. SIB.scale Field Encodings 


Encoded value scale 
(binary) factor 

00 1 

01 2 

10 4 

11 8 


SIB.index (Bits[5:3]). The index field is used to specify the register containing the index portion of 
the indexed register-indirect effective address. SIB.index is encoded as shown in Table 1-12 below. 


SIB.base (Bits[2:0]). The base field is used to specify the register containing the base address 
portion of the indexed register-indirect effective address. SIB.base is encoded as shown in Table 1-12 
below. 
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Table 1-12. SIB.index and .base Field Encodings 


Encoded value : 
: SIB.index SIB.base 
(binary) 
000 [rAX] [rAX] 
001 [rCX] [rCX] 
010 [rDX] [rDX] 
011 [rBX] [rBX] 
100 (none)! [rSP] 
101 [rBP] [rBP], (none)? 
110 [rS]] DH, [rSl] 
111 [rDI] BH, [rDI] 
Notes: 
1. Register specification is null. The scale*index portion of the indexed register-indirect effec- 
tive address is set to 0. 
2. If ModRM.mod = 00b, the register specification is null. The base portion of the indexed reg- 
ister-indirect effective address is set to 0. Otherwise, base encodes the rBP register as 
the source of the base address used in the effective address calculation. 


Table 1-13. SIB.base encodings for ModRM.r/m = 100b 


SIB base Field 
mod 000 001 010 011 100 101 110 111 
00 disp32 
01 [rAX] | [rCX] ) [rDX] | [rBX] | [rSP] | [rBP]+disp8 | [rSl] [rD1] 
10 [rBP]+disp32 
11 (not applicable) 


More discussion of operand addressing follows in the next two sections. 


1.4.3. Operand Addressing in Legacy 32-bit and Compatibility Modes 


The mod and r/m fields of the ModRM byte provide a total of five bits used to encode 32 operand 
specification and memory addressing modes. Table 1-14 below shows these encodings. 
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Table 1-14. Operand Addressing Using ModRM and SIB Bytes 


ModRM.mod | ModRM.r/m Register / Effective Address 
000 [rAX] 
001 [rCX] 
010 [rDX] 
011 [rBX] 
00 
100 siB* 
101 disp32 
110 [rSl] 
111 [rDI] 
000 [rAX]+disp8 
001 [rCX]+disp8 
010 [rDX]+disp8 
011 [rBX]+disp8 
01 
100 SIB+disp8? 
101 [rBP]+disp8 
110 [rSl]+disp8 
111 [rDI]+disp8 
000 [rAX]+disp32 
001 [rCX]+disp32 
010 [rDX]+disp32 
011 [rBX]+disp32 
10 
100 SIB+disp32? 
101 [rBP]+disp32 
110 [rSI]+disp32 
111 [rDI]+disp32 
Notes: 

0. In the following notes, scaled_index = SIB.index * (1 << S/B.scale). 

1. SIB byte follows ModRM byte. Effective address is calculated using 
scaled_index+base. When SIB.base = 101b, addressing mode depends on 
ModRM.mod. See Table 1-13 above. 

2. SIB byte follows ModRM byte. Effective address is calculated using scaled_in- 
dex+base+8-bit_offset. One-byte Displacement field provides the offset. 

3. SIB byte follows ModRM byte. Effective address is calculated using scaled_in- 
dex+base+32-bit_offset. Four-byte Displacement field provides the offset. 
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Table 1-14. Operand Addressing Using ModRM and SIB Bytes (continued) 


ModRM.mod | ModRM.r/m Register / Effective Address 
000 AL/rAX/MMX0/XMMO/Y MMO 

001 CL/rCX/MMX1/XMM1/YMM1 

010 DL/rDX/MMX2/KMM2/YMM2 

011 BL/rBX/MMX3/XMM3/YMM3 

100 AH/SPL/rSP/MMX4/XMM4/YMM4 
101 CH/BPL/rBP/MMX5/XMM5/YMM5 
110 DH/SIL/rSI/MMX6/XMM6/YMM6 
111 BH/DIL/rDI/MMX7/XMM7/YMM7 


11 


Notes: 
0. In the following notes, scaled_index = SIB.index * (1 << SIB.scale). 
1. SIB byte follows ModRM byte. Effective address is calculated using 


scaled_index+base. When SIB.base = 101b, addressing mode depends on 
ModRM.mod. See Table 1-13 above. 


2. SIB byte follows ModRM byte. Effective address is calculated using scaled_in- 
dex+base+8-bit_offset. One-byte Displacement field provides the offset. 

3. SIB byte follows ModRM byte. Effective address is calculated using scaled_in- 
dex+base+32-bit_offset. Four-byte Displacement field provides the offset. 


Note that the addressing mode mod = 11b is a register-direct mode, that is, the operand is contained in 
the specified register, while the modes mod = [00b:10b] specify different addressing modes for a 
memory-based operand. 


For mod = 11D, the register containing the operand is specified by the r/m field. For the other modes 
(mod = [00b:10b]), the mod and r/m fields are combined to specify the addressing mode for the 
memory-based operand. Most are register-indirect addressing modes meaning that the address of the 
memory-based operand is contained in the register specified by r/m. For these register-indirect modes, 
mod = 01b and mod = 10b include an offset encoded in the displacement field of the instruction. 


The encodings {mod # 11b, r/m = 100b} specify the indexed register-indirect addressing mode in 
which the target address is computed using a combination of values stored in registers and a scale 
factor encoded directly in the SIB byte. For these addressing modes the effective address is given by 
the formula: 


effective_address = scale * index + base + offset 


Scale is encoded in SIB.scale field. Index is contained in the register specified by SIB.index field and 
base is contained in the register specified by SIB.base field. Offset is encoded in the displacement field 
of the instruction using either one or four bytes. 


If {mod, r/m} = 00100b, the offset portion of the formula is set to 0. For {mod, r/m} = 01100b and 
{mod, r/m} =10100b, offset is encoded in the one- or 4-byte displacement field of the instruction. 


Finally, the encoding {mod, r/m} = 00101b specifies an absolute addressing mode. In this mode, the 
address is provided directly in the instruction encoding using a 4-byte displacement field. In 64-bit 
mode this addressing mode is changed to RIP-relative (see “RIP-Relative Addressing” on page 24). 
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1.4.4 Operand Addressing in 64-bit Mode 


AMD64 architecture doubles the number of GPRs and increases their width to 64-bits. It also doubles 
the number of YMM/XMM registers. In order to support the specification of register operands 
contained in the eight additional GPRs or YMM/XMM registers and to make the additional GPRs 
available to hold addresses to be used in the addressing modes, the REX prefix provides the R, X, and 
B bit fields to extend the reg, r/m, index, and base fields of the ModRM and SIB bytes in the various 
operand addressing modes to four bits. A fourth REX bit field (W) allows instruction encodings to 
specify a 64-bit operand size. 


Table 1-15 below and the sections that follow describe each of these bit fields. 


Table 1-15. REX Prefix-Byte Fields 


Mnemonic Bit Position(s) Definition 
_ 7:4 0100 (4h) 
REX.W 3 O = Default operand size 


1 = 64-bit operand size 
1-bit (msb) extension of the ModRM reg 
field?, permitting access to 16 registers. 


REX.X 1 1-bit (msb) extension of the SIB index field?, 
permitting access to 16 registers. 
1-bit (msb) extension of the ModRM r/m 


REX.B 0) field’, SIB base field+, or opcode reg field, 
permitting access to 16 registers. 


Notes: 


1. For a description of the ModRM and SIB bytes, see “ModRM and SIB Bytes” on 
page 17. 


REX.W: Operand width (Bit 3). Setting the REX.W bit to 1 specifies a 64-bit operand size. Like the 
existing 66h operand-size override prefix, the REX 64-bit operand-size override has no effect on byte 
operations. For non-byte operations, the REX operand-size override takes precedence over the 66h 
prefix. If a 66h prefix is used together with a REX prefix that has the W bit set to 1, the 66h prefix is 
ignored. However, if a 66h prefix is used together with a REX prefix that has the W bit cleared to 0, 
the 66h prefix is not ignored and the operand size becomes 16 bits. 


REX.R: Register field extension (Bit 2). The REX.R bit adds a 1-bit extension (in the most 
significant bit position) to the ModRM.reg field when that field encodes aGPR, YMM/XMM, control, 
or debug register. REX.R does not modify ModRM.reg when that field specifies other registers or is 
used to extend the opcode. REX.R is ignored in such cases. 


REX.X: Index field extension (Bit 1). The REX.X bit adds a 1-bit (msb) extension to the SIB.index 
field. See “ModRM and SIB Bytes” on page 17. 
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REX.B: Base field extension (Bit 0). The REX.B bit adds a 1-bit (msb) extension to either the 
ModRM.1/m field to specify a GPR or XMM register, or to the SIB.base field to specify a GPR. (See 
Table 2-2 on page 56 for more about the B bit.) 


1.5 Displacement Bytes 


A displacement (also called an offset) is a signed value that is added to the base of a code segment 
(absolute addressing) or to an instruction pointer (relative addressing), depending on the addressing 
mode. The size of a displacement is 1, 2, or 4 bytes. If an addressing mode requires a displacement, the 
bytes (1, 2, or 4) for the displacement follow the opcode, ModRM, or SIB byte (whichever comes last) 
in the instruction encoding. 


In 64-bit mode, the same ModRM and SIB encodings are used to specify displacement sizes as those 
used in legacy and compatibility modes. However, the displacement is sign-extended to 64 bits during 
effective-address calculations. Also, in 64-bit mode, support is provided for some 64-bit displacement 
and immediate forms of the MOV instruction. See “Immediate Operand Size” in Volume 1 for more 
information on this. 


1.6 immediate Bytes 


An immediate is a value—typically an operand value—encoded directly into the instruction. 
Depending on the opcode and the operating mode, the size of an immediate operand can be 1, 2, 4, or 8 
bytes. 64-bit immediates are allowed in 64-bit mode on MOV instructions that load GPRs, otherwise 
they are limited to 4 bytes. See “Immediate Operand Size” in Volume 1 for more information. 


If an instruction takes an immediate operand, the bytes (1, 2, 4, or 8) for the immediate follow the 
opcode, ModRM, SIB, or displacement bytes (whichever come last) in the instruction encoding. Some 
128-bit media instructions use the immediate byte as a condition code. 


1.7 RIP-Relative Addressing 


In 64-bit mode, addressing relative to the contents of the 64-bit instruction pointer (program 
counter)—called RIP-relative addressing or PC-relative addressing—is implemented for certain 
instructions. In such cases, the effective address is formed by adding the displacement to the 64-bit 
RIP of the next instruction. 


In the legacy x86 architecture, addressing relative to the instruction pointer is available only in control- 
transfer instructions. In the 64-bit mode, any instruction that uses ModRM addressing can use RIP- 
relative addressing. This feature is particularly useful for addressing data in position-independent code 
and for code that addresses global data. 


Without RIP-relative addressing, ModRM instructions address memory relative to zero. With RIP- 
relative addressing, ModRM instructions can address memory relative to the 64-bit RIP using a signed 
32-bit displacement. This provides an offset range of +2 Gbytes from the RIP. 
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Programs usually have many references to data, especially global data, that are not register-based. To 
load such a program, the loader typically selects a location for the program in memory and then adjusts 
program references to global data based on the load location. RIP-relative addressing of data makes 
this adjustment unnecessary. 


1.7.1 Encoding 


Table 1-16 shows the ModRM and SIB encodings for RIP-relative addressing. Redundant forms of 
32-bit displacement-only addressing exist in the current ModRM and SIB encodings. There is one 
ModRM encoding with several SIB encodings. RIP-relative addressing is encoded using one of the 
redundant forms. In 64-bit mode, the ModRM disp32 (32-bit displacement) encoding ({mod,r/m} = 
00101b) is redefined to be RIP + disp32 rather than displacement-only. 


Table 1-16. Encoding for RIP-Relative Addressing 


Legacy and : Additional 64-bit 
MedRM se Compatibility Modes gasp Mode Implications 
Zero-based (normal) 
* mod = 00 displacement addressing 
+ 
¢ t/m=101 ae ae alspee ple Olepoe must use SIB form (see 
next row). 
e — 2 
+. Ha =n base = 101 . 
1 |* index = 100° disp32 Same as Legacy None 
¢ r/m= 100 
* scale = xx 
Notes: 
1. Encodes the indexed register-indirect addressing mode with 32-bit offset. 
2. Base register specification Is null (base portion of effective address calculation is set to 0) 
3. index register specification is null (scale*index portion of effective address calculation is set to 0) 


1.7.2 REX Prefix and RIP-Relative Addressing 


ModRM encoding for RIP-relative addressing does not depend on a REX prefix. In particular, the r/m 
encoding of 101, used to select RIP-relative addressing, is not affected by the REX prefix. For 
example, selecting R13 (REX.B = 1, r/m = 101) with mod = 00 still results in RIP-relative addressing. 


The four-bit r/m field of ModRM is not fully decoded. Therefore, in order to address R13 with no 
displacement, software must encode it as R13 + 0 using a one-byte displacement of zero. 


1.7.3. Address-Size Prefix and RIP-Relative Addressing 


RIP-relative addressing is enabled by 64-bit mode, not by a 64-bit address-size. Conversely, use of the 
address-size prefix (“Address-Size Override Prefix” on page 9) does not disable RIP-relative 
addressing. The effect of the address-size prefix is to truncate and zero-extend the computed effective 
address to 32 bits, like any other addressing mode. 
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1.8 Encoding Considerations Using REX 


Figure 1-6 on page 28 shows four examples of how the R, X, and B bits of the REX prefix are 
concatenated with fields from the ModRM byte, SIB byte, and opcode to specify register and memory 
addressing. 


1.8.1 Byte-Register Addressing 


In the legacy architecture, the byte registers (AH, AL, BH, BL, CH, CL, DH, and DL, shown in 
Figure 2-2 on page 38) are encoded in the ModRM reg or r/m field or in the opcode reg field as 
registers 0 through 7. The REX prefix provides an additional byte-register addressing capability that 
makes the least-significant byte of any GPR available for byte operations (Figure 2-3 on page 39). 
This provides a uniform set of byte, word, doubleword, and quadword registers better suited for 
register allocation by compilers. 


1.8.2 Special Encodings for Registers 


Readers who need to know the details of instruction encodings should be aware that certain 
combinations of the ModRM and SIB fields have special meaning for register encodings. For some of 
these combinations, the instruction fields expanded by the REX prefix are not decoded (treated as 
don’t cares), thereby creating aliases of these encodings in the extended registers. Table 1-17 on 
page 27 describes how each of these cases behaves. 
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Table 1-17. Special REX Encodings for Registers 


ModRM and SIB 


Meaning in Legacy and 


Implications in Legacy 
and Compatibility 


Additional REX 


* r/m* = 100 (ESP) 


Encodings* Compatibility Modes Modes Implications 
REX prefix adds a fourth 
ModRM Byt bit (b), which is decoded 
O €: : and modifies the base 
* mod # 11 SIB byte is present. aE Bye etednined 1 register in the SIB byte. 


ESP-based addressing. 


Therefore, the SIB byte is 
also required for R12- 
based addressing. 


ModRM Byte: 
* mod = 00 


* r/m* = x101 (EBP) 


Base register is not used. 


Using EBP without a 
displacement must be 
done by setting mod = 01 
with a displacement of 0 
(with or without an index 
register). 


REX prefix adds a fourth 
bit (x), which is not 
decoded (don’t care). 
Therefore, using RBP or 
R13 without a 
displacement must be 
done via mod = 01 witha 
displacement of 0. 


SIB Byte: 
* index? = x100 (ESP) 


Index register is not used. 


ESP cannot be used as 
an index register. 


REX prefix adds a fourth 
bit (x), which is decoded. 
Therefore, there are no 
additional implications. 
The expanded index field 
is used to distinguish RSP 
from R12, allowing R12 to 
be used as an index. 


SIB Byte: 
¢ base = b101 (EBP) 
* ModRM.mod = 00 


Base register is not used 
if ModRM.mod = 00. 


Base register depends on 
mod encoding. Using 
EBP with a scaled index 
and without a 
displacement must be 
done by setting mod = 01 
with a displacement of 0. 


REX prefix adds a fourth 
bit (b), which is not 
decoded (don’t care). 
Therefore, using RBP or 
R13 without a 
displacement must be 
done via mod = 01 witha 
displacement of 0 (with or 
without an index register). 


Notes: 


1. The REX-prefix bit is shown in the fourth (most-significant) bit position of the encodings for the ModRM r/m, SIB 
index, and SIB base fields. The lower-case “x” for ModRM r/m (rather than the upper-case “B” shown in Figure 1-6 
on page 28) indicates that the REX-prefix bit is not decoded (don’t care). 


2. Foradescription of the ModRM and SIB bytes, see “ModRM and SIB Bytes” on page 17. 
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Examples of Operand Addressing Extension Using REX 


Case 1: Register-Register Addressing (No Memory Operand) 


ModRM Byte 
REX Prefix Opcode mod reg r/m 
4WRXB REX.X is not used 
4 
4 
Rrrr Bbbb 


Case 2: Memory Addressing Without an SIB Byte 


ModRM Byte 
REX Prefix Opcode mod reg r/m 


REX.X is not used 
ModRM reg field != 100 


Rrrr Bbbb 


Case 3: Memory Addressing With an SIB Byte 
ModRM Byte SIB Byte 


REX Prefix Opcode mod reg r/m_ scale index base 
Cae | bb | 00 | bbb | 
f i 4 
4 
Rrrr Xxxx Bbbb 


Case 4: Register Operand Coded in Opcode Byte 


REX Prefix op reg 
4WRXB | | bbb | REX.R is not used 
REX.X is not used 

4 


Bbbb 


Figure 1-6. Encoding Examples Using REX R, X, and B Bits 
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1.9 Encoding Using the VEX and XOP Prefixes 


An extended escape sequence is introduced by an encoding escape prefix which establishes the context 
and the format of the bytes that follow. The currently defined prefixes fall in two classes: the XOP and 
the VEX prefixes (of which there are two). The XOP prefix and the VEX C4h prefix introduce a three 
byte sequence with identical syntax, while the VEX C5h prefix introduces a two-byte escape sequence 
with a different syntax. 


These escape sequences supply fields used to extend operand specification as well as provide for the 
selection of alternate opcode maps. Encodings support up to two additional operands and the 
addressing of the extended (beyond 7) registers. The specification of two of the operands is 
accomplished using the legacy ModRM and optional SIB bytes with the reg, r/m, index, and base 
fields extended by one bit in a manner analogous to the REX prefix. 


The encoding of the extended SSE instructions utilize extended escape sequences. XOP instructions 
use three-byte escape sequences introduced by the XOP prefix. The AVX, FMA, FMA4, and CLMUL 
instruction subsets use three-byte or two-byte escape sequences introduced by the VEX prefixes. 


1.9.1 Three-Byte Escape Sequences 


All the extended instructions can be encoded using a three-byte escape sequence, but certain VEX- 
encoded instructions that comply with the constraints described below in Section 1.9.2, “Two-Byte 
Escape Sequence” can also utilize a two-byte escape sequence. Figure 1-7 below shows the format of 
the three-byte escape sequence which is common to the XOP and VEX-based encodings. 


Byte 0 Byte 1 Byte 2 
7 O|/7 6 5 4 O|7 6 3 2 1 O 
Encoding escape prefix R|X/B map_select Ww vwvv L pp 


Figure 1-7. VEX/XOP Three-byte Escape Sequence Format 


Byte Bit Mnemonic Description 
0 [7:0] VEX, XOP Value specific to the extended instruction set 
1 [7] R Inverted one-bit extension of ModRM reg field 
[6] Xx Inverted one-bit extension of SIB index field 
[5] B Inverted one-bit extension, r/m field or SIB base 
field 
[4:0] map_select Opcode map select 
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Byte Bit Mnemonic Description 
2 [7] W Default operand size override for a general 


purpose register to 64-bit size in 64-bit mode; 
operand configuration specifier for certain 
YMM/XMM-based operations. 


[6:3] vWwvv Source or destination register selector, in ones’ 
complement format 


[2] L Vector length specifier 


[1:0] pp Implied 66, F2, or F3 opcode extension 


Table 1-18. Three-byte Escape Sequence Field Definitions 


Byte 0 (VEX/XOP Prefix) 


Byte 0 is the encoding escape prefix byte which introduces the encoding escape sequence and 
establishes the context for the bytes that follow. The VEX and XOP prefixes have the following 
encodings: 


¢ VEX prefix is encoded as C4h 
¢ XOP prefix is encoded as 8Fh 


Byte 1 


VEX/XOP.R (Bit 7). The bit-inverted equivalent of the REX.R bit. A one-bit extension of the 
ModRM. reg field in 64-bit mode, permitting access to 16 YMM/XMM and GPR registers. In 32-bit 
protected and compatibility modes, the value must be 1. 


VEX/XOP.X (Bit 6). The bit-inverted equivalent of the REX.X bit. A one-bit extension of the 
SIB.index field in 64-bit mode, permitting access to 16 YMM/XMM and GPR registers. In 32-bit 
protected and compatibility modes, this value must be 1. 


VEX/XOP.B (Bit 5). The bit-inverted equivalent of the REX.B bit, available only in the 3-byte prefix 
format. A one-bit extension of either the ModRM.1r/m field, to specify a GPR or XMM register, or of 
the SIB base field, to specify a GPR. This permits access to all 16 GPR and YMM/XMM registers. In 
32-bit protected and compatibility modes, this bit is ignored. 


VEX/XOP.map_select (Bits [4:0]). The five-bit map_select field is used to select an alternate 
opcode map. The map select encoding spaces for VEX and XOP are disjoint. Table 1-19 below lists 
the encodings for VEX.map_select and Table 1-20 lists the encodings for XOP.map_select. 


Table 1-19. VEX.map_select Encoding 


Binary Value Opcode Map Analogous Legacy Opcode Map 
00000 Reserved - 
00001 VEX opcode map 1_ ‘| Secondary (“two-byte”) opcode map 
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Table 1-19. VEX.map_select Encoding 


Binary Value Opcode Map Analogous Legacy Opcode Map 
00010 VEX opcode map 2 OF_38h (“three-byte”) opcode map 
00011 VEX opcode map 3 OF_3Ah (“three-byte”) opcode map 

00100 — 11111 Reserved - 


Table 1-20. XOP.map_select Encoding 


Binary Value Opcode Map 
00000 — 00111 Reserved 
01000 XOP opcode map 8 
01001 XOP opcode map 9 
01010 XOP opcode map 10 (Ah) 
01011 — 11111 Reserved 


AVX instructions are encoded using the VEX opcode maps 1-3. The AVX instruction set includes 
instructions that provide operations similar to most legacy SSE instructions. For those AVX 
instructions that have an analogous legacy SSE instruction, the VEX opcode maps use the same binary 
opcode value and modifiers as the legacy version. The correspondence between the VEX opcode maps 
and the legacy opcode maps are shown in Table 1-19 above. 


VEX opcode maps 1-3 are also used to encode the FMA4 and FMA instructions. In addition, not all 
legacy SSE instructions have AVX equivalents. Therefore, the VEX opcode maps are not the same as 
the legacy opcode maps. 


The XOP opcode maps are unique to the XOP instructions. The XOP.map_select value is restricted to 
the range [08h:1Fh]. If the value of the XOP.map_select field is less than 8, the first two bytes of the 
three-byte XOP escape sequence are interpreted as a form of the POP instruction. 


Both legacy and extended opcode maps are covered in detail in Appendix A. 
Byte 2 


VEX/XOP.W (Bit 7). Function is instruction-specific. The bit is often used to configure source 
operand order. 


VEX/XOP.vvvv (Bits [6:3]). Used to specify an additional operand for three and four operand 
instructions. Encodes an XMM or YMM register in inverted ones’ complement form, as shown in 
Table 1-21. 
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Table 1-21. VEX/XOP.vvvv Encoding 


Binary Value Register Binary Value Register 
0000 XMM15/YMM15 1000 XMMO0O7/YMMO7 
0001 XMM14/YMM14 1001 XMMO6/YMMO6 
0010 XMM13/YMM13 1010 XMM05/YMMO05 
0011 XMM12/YMM12 1011 XMM04/YMM04 
0100 XMM11/YMM11 1100 XMMO03/YMMO3 
0101 XMM10/YMM10 1101 XMMO02/Y MM02 
0110 XMMO9/YMMO09 1110 XMMO1/YMMO01 
0111 XMM0O8/YMMO8 1111 XMMO0/YMMOO 


Values 0000h to 0111h are not valid in 32-bit modes. vvwv is typically used to encode the first source 
operand, but for the VPSLLDQ, VPSRLDQ, VPSRLW, VPSRLD, VPSRLQ, VPSRAW, VPSRAD, 
VPSLLW, VPSLLD, and VPSLLQ shift instructions, the field specifies the destination register. 


VEX/XOP.L (Bit 2). L = 0 specifies 128-bit vector length (XMM registers/128-bit memory 
locations). L=1 specifies 256-bit vector length (YMM registers/256-bit memory locations). For SSE or 
XOP instructions with scalar operands, the L bit is ignored. Some vector SSE instructions support only 
the 128 bit vector size. For these instructions, L is cleared to 0. 


VEXI/XOP.pp (Bits [1:0]). Specifies an implied 66h, F2h, or F3h opcode extension which is used in a 
way analogous to the legacy instruction encodings to extend the opcode encoding space. The 
correspondence between the encoding of the VEX/XOP.pp field and its function as an opcode modifier 
is shown in Table 1-22. The legacy prefixes 66h, F2h, and F3h are not allowed in the encoding of 
extended instructions. 


Table 1-22. VEX/XOP.pp Encoding 


Binary Value | Implied Prefix 
00 None 
01 66h 
10 F3h 
11 F2h 


1.9.2 Two-Byte Escape Sequence 


All VEX-encoded instructions can be encoded using the three-byte escape sequence, but certain 
instructions can also be encoded utilizing a more compact, two-byte VEX escape sequence. The 
format of the two-byte escape sequence is shown in Figure 1-8 below. 
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Byte 0 Byte 1 
7 O0;}7 6 3 2 1 0 
VEX R VV L pp 


Figure 1-8. VEX Two-byte Escape Sequence Format 


Prefix Byte Bit Mnemonic Description 
0 [7:0] VEX VEX 2-byte encoding escape prefix 
1 [7] R Inverted one-bit extension of ModRM.reg field 
[6:3] NAVAYAY, Source or destination register selector, in ones’ 
complement format. 
[2] L Vector length specifier 
[1:0] pp Implied 66, F2, or F3 opcode extension. 


Table 1-23. VEX Two-byte Escape Sequence Field Definitions 
Byte 0 (VEX Prefix) 
The VEX prefix for the two-byte escape sequence is encoded as C5Sh. 
Byte 1 


Note that the bit 7 of this byte is used to encode VEX.R instead of VEX.W as in the three-byte escape 
sequence form. The R, vvvv, L, and pp fields are defined as in the three-byte escape sequence. 


When the two-byte escape sequence is used, specific fields from the three-byte format take on fixed 
values as shown in Table 1-24 below. 


Table 1-24. Fixed Field Values for VEX 2-Byte Format 


VEX Field Value 
X 1 
B 1 
W 0 
map_select 00001b 


Although they may be encoded using the VEX three-byte escape sequence, all instructions that 
conform with the constraints listed in Table 1-24 may be encoded using the two-byte escape sequence. 
Note that the implied value of map_select is 00001b, which means that only instructions included in 
the VEX opcode map 1 may be encoded using this format. 


VEX-encoded instructions that use the other defined values of map_select (00010b and 00011b) 
cannot be encoded using this a two-byte escape sequence format. Note that the VEX.pp field value is 
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explicitly encoded in this form and can be used to specify any of the implied legacy prefixes as defined 
in Table 1-22. 
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2 Instruction Overview 
2.1 Instruction Groups 


For easier reference, the instruction descriptions are divided into five groups based on usage. The 
following sections describe the function, mnemonic syntax, opcodes, affected flags, and possible 
exceptions generated by all instructions in the AMD64 architecture: 


¢ Chapter 3, “General-Purpose Instruction Reference”—The general-purpose instructions are used 
in basic software execution. Most of these load, store, or operate on data in the general-purpose 
registers (GPRs), in memory, or in both. Other instructions are used to alter sequential program 
flow by branching to other locations within the program or to entirely different programs. 


¢ Chapter 4, “System Instruction Reference”—The system instructions establish the processor 
operating mode, access processor resources, handle program and system errors, and manage 
memory. 


e “SSE Instruction Reference” in Volume 4—The Streaming SIMD Extensions (SSE) instructions 
load, store, or operate on data located in the YMM/XMM registers. These instructions define both 
vector and scalar operations on floating-point and integer data types. They include the SSE and 
SSE2 instructions that operate on the YMM/XMM registers. Some of these instructions convert 
source operands in YMM/XMM registers to destination operands in GPR, MMX, or x87 registers 
or otherwise affect YMM/XMM state. 


¢ “64-Bit Media Instruction Reference” in Volume 5—The 64-bit media instructions load, store, or 
operate on data located in the 64-bit MMX registers. These instructions define both vector and 
scalar operations on integer and floating-point data types. They include the legacy MMX™ 
instructions, the 3DNow!™ instructions, and the AMD extensions to the MMX and 3DNow! 
instruction sets. Some of these instructions convert source operands in MMX registers to 
destination operands in GPR, YMM/XMM, or x87 registers or otherwise affect MMX state. 


¢ “x87 Floating-Point Instruction Reference” in Volume 5—The x87 instructions are used in legacy 
floating-point applications. Most of these instructions load, store, or operate on data located in the 
x87 ST(0)-ST(7) stack registers (the FPRO—FPR7 physical registers). The remaining instructions 
within this category are used to manage the x87 floating-point environment. 


The description of each instruction covers its behavior in all operating modes, including legacy mode 
(real, virtual-8086, and protected modes) and long mode (compatibility and 64-bit modes). Details of 
certain kinds of complex behavior—such as control-flow changes in CALL, INT, or FXSAVE 
instructions—have cross-references in the instruction-detail pages to detailed descriptions in volumes 
1 and 2. 


Two instructions—CMPSD and MOVSD—use the same mnemonic for different instructions. 
Assemblers can distinguish them on the basis of the number and type of operands with which they are 
used. 
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2.2 Reference-Page Format 


Figure 2-1 on page 37 shows the format of an instruction-detail page. The instruction mnemonic is 
shown in bold at the top-left, along with its name. In this example, POPFD is the mnemonic and POP 
to EFLAGS Doubleword is the name. Next, there is a general description of the instruction’s operation. 
Many descriptions have cross-references to more detail in other parts of the manual. 


Beneath the general description, the mnemonic is shown again, together with the related opcode(s) and 
a description summary. Related instructions are listed below this, followed by a table showing the 
flags that the instruction can affect. Finally, each instruction has a summary of the possible exceptions 
that can occur when executing the instruction. The columns labeled “Real” and “Virtual-8086” apply 
only to execution in legacy mode. The column labeled “Protected” applies both to legacy mode and 
long mode, because long mode is a superset of legacy protected mode. 


The 128-bit and 64-bit media instructions also have diagrams illustrating the operation. A few 
instructions have examples or pseudocode describing the action. 
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AAM ASCII Adjust After Multiply 


Converts the value in the AL register from binary to two unpacked BCD digits in the 
AH (most significant) and AL (least significant) registers using the following formula: 


AH (AL/10d) 
AL (AL mod 10d). 


In most modern assemblers, the AAM instruction adjusts to base-10 values. However, 
by coding the instruction directly in binary, it can adjust to any base specified by the 
immediate byte value (ib) suffixed onto the D4h opcode. For example, code D408h for 
octal, D40Ah for decimal, and D40Ch for duodecimal (base 12). 


Using this instruction in 64;bit mode generates an invalid-opcode exception. 


Mnemonic Opcode Description 


AAM D4 0A Create a pair of unpacked BCD values in AH and AL. 
(Invalid in 64-bit mode.) 


(None) D4 ib Create a pair of unpacked values to the immediate byte base. 
(Invalid in 64-bit mode.) 


“M” means the flaq is either set or 
cleared, depending on the result. 


Related Instructions 
AAA, AAD, AAS 


rFLAGS Affected 


ID | VIP | VIF ZF | AF CF 
M | U U 


21 | 20 | 19 | 18 | 17 | 16 | 14 13-12 a 10 9 8 7 6 4 2 0 


Note: Bits 31-22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M. Unaffected flags are blank. Undefined flags are U. 


Exceptions 


Virtual 
Exception Real| 8086 |Protected Cause of Exception 


Divide by zero, #DE xX x xX 8-bit immediate value was 0. 
Invalid opcode, #UD xX This instruction was executed in 64-bit mode. 


Possible exceptions “Protected” column Alphabetic mnemonic locator 
and causes, by mode of covers both legacy 
operation and long mode 


Figure 2-1. Format of Instruction-Detail Pages 
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2.3 Summary of Registers and Data Types 


This section summarizes the registers available to software using the five instruction subsets described 
in “Instruction Groups” on page 35. For details on the organization and use of these registers, see their 
respective chapters in volumes 1 and 2. 


2.3.1 General-Purpose Instructions 


Registers. The size and number of general-purpose registers (GPRs) depends on the operating 
mode, as do the size of the flags and instruction-pointer registers. Figure 2-2 shows the registers 
available in legacy and compatibility modes. 


register high — low 

encoding 8-bit 8-bit 16-bit 32-bit 
0 AH (4)| AL AX EAX 
3 BH (7)|_ BL BX EBX 
1 CH(5)) CL CX ECX 
2 DH (6)} DL DX EDX 
6 SI SI ESI 
7 DI DI EDI 
5 BP BP EBP 
4 SP SP ESP 


3] 16 15 


FLAGS FLAGS EFLAGS 
IP IP EIP 


3] 0 


Figure 2-2. General Registers in Legacy and Compatibility Modes 


Figure 2-3 on page 39 shows the registers accessible in 64-bit mode. Compared with legacy mode, 
registers become 64 bits wide, eight new data registers (R8—R15) are added and the low byte of all 16 
GPRs is available for byte operations, and the four high-byte registers of legacy mode (AH, BH, CH, 
and DH) are not available if the REX prefix is used. The high 32 bits of doubleword operands are zero- 
extended to 64 bits, but the high bits of word and byte operands are not modified by operations in 64- 
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bit mode. The RFLAGS register is 64 bits wide, but the high 32 bits are reserved. They can be written 


with anything but they read as zeros (RAZ). 


zero-extended 
for 32-bit operands 


<— not modified for 16-bit operands —> low 
<—— not modified for 8-bit operands ———>|8 bits 16-bit 


R8W 

ROW 

R10W 
R11W 
R12W 
R13W 
R14W 
R15W 


Register Encoding 


RIP 
63 32 31 0 


* Not addressable in REX prefix instruction forms 
** Only addressable in REX prefix instruction forms 


0 RFLAGS 


32-bit 
EAX 
EBX 
ECX 
EDX 
ESI 
EDI 
EBP 
ESP 
R8D 
R9D 
R10D 
R11D 
R12D 
R13D 
R14D 
R15D 


Figure 2-3. General Registers in 64-Bit Mode 


64-bit 
RAX 
RBX 
RCX 
RDX 
RSI 
RDI 
RBP 
RSP 
R8 
RQ 
R10 
R11 
R12 
R13 
R14 
R15 


For most instructions running in 64-bit mode, access to the extended GPRs requires a either a REX 
instruction modification prefix or extended encoding encoding using the VEX or XOP sequences 


(page 16). 
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Figure 2-4 shows the segment registers which, like the instruction pointer, are used by all instructions. 
In legacy and compatibility modes, all segments are accessible. In 64-bit mode, which uses the flat 
(non-segmented) memory model, only the CS, FS, and GS segments are recognized, whereas the 
contents of the DS, ES, and SS segment registers are ignored (the base for each of these segments is 
assumed to be zero, and neither their segment limit nor attributes are checked). For details, see 
“Segmented Virtual Memory” in Volume 2. 


Legacy Mode and 
Compatibility Mode 


cs 
DS 


Figure 2-4. Segment Registers 


Data Types. Figure 2-5 on page 41 shows the general-purpose data types. They are all scalar, integer 
data types. The 64-bit (quadword) data types are only available in 64-bit mode, and for most 
instructions they require a REX instruction prefix. 
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i Signed Integer : 


Double 
Quadword 


Quadword 


16 bytes (64-bit mode only) 


8 bytes (64-bit mode only) 


Doubleword 


Unsigned Integer 
127 0 


16 bytes (64-bit mode only) 


Double 
Quadword 


Quadword 


8 bytes (64-bit mode only) 


Doubleword 


Word 


Byte 


Packed BCD 


BCD Digit 


Figure 2-5. General-Purpose Data Types 


2.3.2 System Instructions 


Registers. The system instructions use several specialized registers shown in Figure 2-6 on page 42. 
System software uses these registers to, among other things, manage the processor’s operating 
environment, define system resource characteristics, and monitor software execution. With the 
exception of the RFLAGS register, system registers can be read and written only from privileged 
software. 


All system registers are 64 bits wide, except for the descriptor-table registers and the task register, 
which include 64-bit base-address fields and other fields. 
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Control Registers Extended-Feature-Enable Register Memory-Typing Registers 


System-Configuration Register 


System-Linkage Registers 


| STAR TOP_MEM 
System-Flags Register LSTAR TOP_MEM2 
RFLAGS ; 
CSTAR 


; SFMASK Performance-Monitoring Registers 
Debug Registers 3 3 
PerfEvtSeln 


PerfCtrn 


Machine-Check Registers 


MCG_CAP 


Debug-Extension Registers 


LastIntToIP 


Task Register 


Figure 2-6. System Registers 


Data Structures. Figure 2-7 on page 43 shows the system data structures. These are created and 
maintained by system software for use in protected mode. A processor running in protected mode uses 
these data structures to manage memory and protection, and to store program-state information when 
an interrupt or task switch occurs. 
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Segment Descriptors (Contained in Descriptor Tables) Task-State Segment 


Local-Descriptor Table 


ee eee 


Global-Descriptor Table Interrupt-Descriptor Table Local-Descriptor Table 


Page-Map Level-4 Page-Directory Pointer Page Directory Page Table 


ote 1 


Figure 2-7. System Data Structures 


2.3.3 SSE Instructions 


Registers. The SSE instructions operate primarily on 128-bit and 256-bit floating-point vector 
operands located in the 256-bit YMM/XMM registers. Each 128-bit XMM register is defined as the 
lower octword of the corresponding YMM register. The number of available YMM/XMM data 
registers depends on the operating mode, as shown in Figure 2-8 below. In legacy and compatibility 
modes, eight YMM/XMM registers (YMM/XMMO0-7) are available. In 64-bit mode, eight additional 
YMM/XMM data registers (YMM/XMMB8-15) are available. These eight additional registers are 
addressed via the encoding extensions provided by the REX, VEX, and XOP prefixes. 
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The MXCSR register contains floating-point and other control and status flags used by the 128-bit 
media instructions. Some 128-bit media instructions also use the GPR (Figure 2-2 and Figure 2-3) and 
the MMX registers (Figure 2-12 on page 48) or set or clear flags in the rFLAGS register (see 
Figure 2-2 and Figure 2-3). 


255 127 0 
XMMO YMMO 
XMM1 YMM1 
XMM2 YMM2 
XMM3 YMM3 
XMM4 YMM4 
XMM5 YMM5 
XMM6 YMM6 


Media eXtension Control and Status Register MXCSR | 


Available in all modes 31 0 


Available only in 64-bit mode 


Figure 2-8. SSE Registers 


Data Types. The SSE instruction set architecture provides support for 128-bit and 256-bit packed 
floating-point and integer data types as well as integer and floating-point scalars. Figure 2-9 below 
shows the 128-bit data types. Figure 2-10 on page 46 and Figure 2-11 on page 47 show the 256-bit 
data types. The floating-point data types include [IEEE-754 single precision and double precision 


types. 
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Vector (Packed) Floating-Point — Double Precision and Single Precision 


127 115 63 51 0 
s| exp significand significand 
[ow | seine [ow sand few sane on coors 
127 

Vector (Packed) Signed Integer - Quadword, Doubleword, Word, Byte 
s| quadword quadword 
s| doubleword s| doubleword s doubleword s doubleword 


word | word word | word js word 


127 119 111 103 95 87 79 11 63 55 47 39 31 23 15 7 0 


Vector (Packed) Unsigned Integer — Quadword, Doubleword, Word, Byte 


quadword quadword 


doubleword 


doubleword doubleword doubleword 


word word word word 


byte | byte | byte | byte 


byte | byte | byte | byte | byte | byte | byte | byte | byte | byte 


127 119 111 103 95 87 79 11 63 55 47 39 31 23 15 7 0 


Scalar Floating-Point — Double Precision and Single Precision' 


is) exp significand 


ee a 4} exp | significand 
31 22 


Scalar Signed Integers : 


double quadword (octword) 
127 s| quadword 


63 doubleword 


Scalar Unsigned Integers 


double quadword (octword) 


quadword 


doubleword 


Note: 1) A 16 bit Half-Precision Floating-Point Scalar is also defined. 


Figure 2-9. 128-Bit SSE Data Types 


AMD64 Technology 
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Vector (Packed) Floating-Point — Double Precision and Single Precision 


255 243 191 179 128 
5) exp significand Ss) exp significand 

| exp significand | exp | significand | exp significand | exp | significand 

255 246 223 214 191 182 159 150 128 
127 115 63 51 0 
5) exp significand 5) exp significand 

| exp | significand | exp significand + exo | significand | exp significand 

127 118 95 86 63 54 31 22 0 


Vector (Packed) Signed Integer - Double Quadword, Quadword, Doubleword, Word, Byte 


double quadword (octword) 


quadword 


quadword 


doubleword doubleword doubleword doubleword 


151 143 135 128 


double quadword (octword) 


quadword 


S| 


quadword 


doubleword 


S| 


doubleword 


S| 


doubleword 


doubleword 


S| 


word s} = word s|} = word s 


S| 


word 


S| 


word 


word s| = word s| = word 


Ss 


byte |s) byte js) byte |s) byte |s) byte |} byte |s 


127 119 111 103 95 87 


79 


byte }| byte 


71 63 


byte | byte |s 


55 47 


byte | byte |§ byte |§ byte \s| byte | byte 


39 31 23 15 7 0 


Figure 2-10. SSE 256-bit Data Types 


Instruction Overview 


[AMD Public Use] 


AMDi¢1 


24594—Rev. 3.33—November 2021 


AMD64 Technology 


Vector (Packed) Unsigned Integer — Double Quadword, Quadword, Doubleword, Word, Byte 


double quadword (octword) 


quadword 


quadword 


doubleword 


doubleword 


doubleword 


doubleword 


word 


word 


word 


word 


word 


word 


word 


word 


byte 


byte | byte 


207 


199 


191 


183 


byte 


byte 


175 


167 


159 


15 


byte 


dl 


143 


byte 


135 128 


quadword 


double quadword (octword) 


quadword 


doubleword 


doubleword 


doubleword 


doubleword 


word 


word 


word 


word 


word 


word 


word 


word 


byte | byte 


byte 


byte 


byte | byte 


Scalar Floating-Point — Double Precision and Single Precision' 


byte 


51 


Scalar Signed Integers 


significand 


[== | 
31 


22 


ie} 


double quadword 


quadword 
63 s doubleword 


<= 


Scalar Unsigned Integers 


31 


15 


double quadword 


quadword 


Note: 1) A 16 bit Half-Precision Floating-Point Scalar is also defined. 


Figure 2-11. SSE 256-Bit Data Types (Continued) 
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2.3.4 64-Bit Media Instructions 


Registers. The 64-bit media instructions use the eight 64-bit MMX registers, as shown in 
Figure 2-12. These registers are mapped onto the x87 floating-point registers, and 64-bit media 
instructions write the x87 tag word in a way that prevents an x87 instruction from using MMX data. 


Some 64-bit media instructions also use the GPR (Figure 2-2 and Figure 2-3) and the XMM registers 
(Figure 2-8). 


MMxX Data Registers 
63 0 


mmx0 


mmx! 


mmx2 


mmx3 


mmx4 


mmx5 


mmx6 


mmx7 


Figure 2-12. 64-Bit Media Registers 


Data Types. Figure 2-13 on page 49 shows the 64-bit media data types. They include floating-point 
and integer vectors and integer scalars. The floating-point data type, used by 3DNow! instructions, 
consists of a packed vector or two IEEE-754 32-bit single-precision data types. Unlike other kinds of 
floating-point instructions, however, the 3DNow!™ instructions do not generate floating-point 
exceptions. For this reason, there is no register for reporting or controlling the status of exceptions in 
the 64-bit-media instruction subset. 


48 AM D Public Use] Instruction Overview 


AMDd1 


24594—Rev. 3.33—November 2021 AMD64 Technology 
Vector (Packed) Single-Precision Floating-Point 
exp significand 5) exp significand 
63 54 31 22 0 
Vector (Packed) Signed Integers 
s doubleword s doubleword 
s| word |} word f} word — |s 
byte |5) byte | 
63 55 47 39 31 B 15 7 0 
Vector (Packed) Unsigned Integers 
doubleword doubleword 
word word word word 
byte byte 
63 55 47 39 31 B 15 7 0 
Signed Integers 
quadword 
doubleword 
7 0 
Unsigned Integers 
quadword 
doubleword 
0 
Figure 2-13. 64-Bit Media Data Types 
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2.3.5 x87 Floating-Point Instructions 


Registers. The x87 floating-point instructions use the x87 registers shown in Figure 2-14. There are 
eight 80-bit data registers, three 16-bit registers that hold the x87 control word, status word, and tag 
word, and three registers (last instruction pointer, last opcode, last data pointer) that hold information 
about the last x87 operation. 


The physical data registers are named FPRO—FPR7, although x87 software references these registers 
as a Stack of registers, named ST(0)-ST(7). The x87 instructions store operands only in their own 80- 
bit floating-point registers or in memory. They do not access the GPR or XMM registers. 


x87 Data Registers 
79 0 


fpro 


fpr 


fpr2 


fpr3 


fpr4 


fpr5 


fpr6é 


fpr7 


Instruction Pointer (rlP) Control Word 


Data Pointer (rDP) Status Word 


63 Opcode Tag Word 


Figure 2-14. x87 Registers 


Data Types. Figure 2-15 on page 51 shows all x87 data types. They include three floating-point 
formats (80-bit double-extended precision, 64-bit double precision, and 32-bit single precision), three 
signed-integer formats (quadword, doubleword, and word), and an 80-bit packed binary-coded 
decimal (BCD) format. 
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Floating-Point 


Double-Extended 


significand Precision 


significand Double Precision 


significand 
31 22 0 


Single Precision 


Signed Integer 


s 8 bytes Quadword 


63 s 4 bytes Doubleword 


3 Word 


Binary-Coded Decimal (BCD) 


5 | Packed Decimal 


79 7| 0 


Figure 2-15. x87 Data Types 


2.4 Summary of Exceptions 


Table 2-1 on page 52 lists all possible exceptions. The table shows the interrupt-vector numbers, 
names, mnemonics, source, and possible causes. Exceptions that apply to specific instructions are 
documented with each instruction in the instruction-detail pages that follow. 
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Table 2-1. Interrupt-Vector Source and Cause 
Vector Interrupt (Exception) Mnemonic | Source Cause 
0 Divide-By-Zero-Error #DE Software | DIV, IDIV, AAM instructions 
Debug #DB Internal | Instruction accesses and data accesses 
2 Non-Maskable-Interrupt #NMI External | External NMI signal 
3 Breakpoint #BP Software | INT3 instruction 
4 Overflow #OF Software | INTO instruction 
5 Bound-Range #BR Software | BOUND instruction 
6 Invalid-Opcode #UD Internal | Invalid instructions 
t Device-Not-Available #NM Internal | x87 instructions 
8 Double-Fault #DF Internal | Interrupt during an interrupt 
9 Coprocessor-Segment-Overrun — External | Unsupported (reserved) 
10 Invalid-TSS “TS cera Task-state segment access and task 
switch 
11 Segment-Not-Present #NP Internal | Segment access through a descriptor 
12 Stack #SS Internal | SS register loads and stack references 
13 General-Protection #GP Internal pals BECesS es enuiprelcedan 
14 Page-Fault #PF Internal pees Beteose= Warn Paging 
15 Reserved — 
16 |pending WF | Software [Toaiing-pointinstuctions 
17 Alignment-Check #AC Internal | Memory accesses 
18 Machine-Check #MC adied Model specific 
19 SIMD Floating-Point #XF Internal | 128-bit media floating-point instructions 
20 Reserved — 
21 Control-Protection #CP Internal | Shadow Stack Protection checks 
22—27 | Reserved (Internal and External) — 
28 Hypervisor Injection Exception #HV Software | Event injection 
29 VMM Communication Exception #VC Internal | Virtualization event 
30 SVM Security Exception #SX External | Security-sensitive events 
31 Reserved (Internal and External) — 
O—255 | External Interrupts (Maskable) #INTR External | External interrupt signal 
O—255 | Software Interrupts — Software |INTn instruction 
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2.5 Notation 


2.5.1 Mnemonic Syntax 


Each instruction has a syntax that includes the mnemonic and any operands that the instruction can 
take. Figure 2-16 shows an example of a syntax in which the instruction takes two operands. In most 
instructions that take two operands, the first (left-most) operand is both a source operand (the first 
source operand) and the destination operand. The second (right-most) operand serves only as a source, 
not a destination. 


ADDPD xmm1, xmm2/mem128 


Mnemonic | 


First Source Operand 
and Destination Operand 


Second Source Operand 
Figure 2-16. Syntax for Typical Two-Operand Instruction 


The following notation is used to denote the size and type of source and destination operands: 
¢ cReg—cControl register. 

¢ dReg—Debug register. 

¢ imm8&—Byte (8-bit) immediate. 

e¢ imm16—Word (16-bit) immediate. 

¢ imm16/32—Word (16-bit) or doubleword (32-bit) immediate. 

¢ imm32—Doubleword (32-bit) immediate. 

¢ imm32/64—Doubleword (32-bit) or quadword (64-bit) immediate. 

¢ imm64—Quadword (64-bit) immediate. 

* mem—An operand of unspecified size in memory. 

* mem8—Byte (8-bit) operand in memory. 

¢ mem16—Word (16-bit) operand in memory. 

¢ mem16/32—Word (16-bit) or doubleword (32-bit) operand in memory. 
e mem32—Doubleword (32-bit) operand in memory. 

e mem32/48—Doubleword (32-bit) or 48-bit operand in memory. 


* mem48—48-bit operand in memory. 
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mem64—Quadword (64-bit) operand in memory. 

mem128—Double quadword (128-bit) operand in memory. 

mem16:16—Two sequential word (16-bit) operands in memory. 

mem16:32—A doubleword (32-bit) operand followed by a word (16-bit) operand in memory. 
mem32real— Single-precision (32-bit) floating-point operand in memory. 
mem16int— Word (16-bit) integer operand in memory. 

mem32int— Doubleword (32-bit) integer operand in memory. 

mem64real— Double-precision (64-bit) floating-point operand in memory. 
mem64int— Quadword (64-bit) integer operand in memory. 

mem80real— Double-extended-precision (80-bit) floating-point operand in memory. 
mem80dec—80-bit packed BCD operand in memory, containing 18 4-bit BCD digits. 
mem2env— 16-bit x87 control word or x87 status word. 


mem14/28env— 14-byte or 28-byte x87 environment. The x87 environment consists of the x87 
control word, x87 status word, x87 tag word, last non-control instruction pointer, last data pointer, 
and opcode of the last non-control instruction completed. 


mem94/108env—94-byte or 108-byte x87 environment and register stack. 
mem512env—512-byte environment for 128-bit media, 64-bit media, and x87 instructions. 
mmx—Quadword (64-bit) operand in an MMX register. 


mmx 1—Quadword (64-bit) operand in an MMX register, specified as the left-most (first) operand 
in the instruction syntax. 


mmx2—Quadword (64-bit) operand in an MMX register, specified as the right-most (second) 
operand in the instruction syntax. 


mmx/mem32—Doubleword (32-bit) operand in an MMX register or memory. 
mmx/mem64—Quadword (64-bit) operand in an MMX register or memory. 


mmx 1/mem64—Quadword (64-bit) operand in an MMX register or memory, specified as the left- 
most (first) operand in the instruction syntax. 


mmx2/mem64—Quadword (64-bit) operand in an MMX register or memory, specified as the right- 
most (second) operand in the instruction syntax. 


moffset— Direct memory offset that specifies an operand in memory. 

moffset8— Direct memory offset that specifies a byte (8-bit) operand in memory. 
moffset16— Direct memory offset that specifies a word (16-bit) operand in memory. 
moffset32— Direct memory offset that specifies a doubleword (32-bit) operand in memory. 
moffset64— Direct memory offset that specifies a quadword (64-bit) operand in memory. 
pntr16:16—Far pointer with 16-bit selector and 16-bit offset. 

pntr16:32—Far pointer with 16-bit selector and 32-bit offset. 

reg—Operand of unspecified size in a GPR register. 
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reg8—Byte (8-bit) operand in a GPR register. 

reg16—Word (16-bit) operand in a GPR register. 

reg16/32—Word (16-bit) or doubleword (32-bit) operand in a GPR register. 
reg32—Doubleword (32-bit) operand in a GPR register. 
reg64—Quadword (64-bit) operand in a GPR register. 

reg/mem8—Byte (8-bit) operand in a GPR register or memory. 
reg/mem16—Word (16-bit) operand in a GPR register or memory. 
reg/mem32—Doubleword (32-bit) operand in a GPR register or memory. 
reg/mem64—Quadword (64-bit) operand in a GPR register or memory. 
rel8off—Signed 8-bit offset relative to the instruction pointer. 
rel16off—Signed 16-bit offset relative to the instruction pointer. 
rel32off—Signed 32-bit offset relative to the instruction pointer. 

segReg or sReg—Word (16-bit) operand in a segment register. 
ST(0)—x87 stack register 0. 

ST(i)—x87 stack register i, where i is between 0 and 7. 

xmm—Double quadword (128-bit) operand in an XMM register. 


xmm1—Double quadword (128-bit) operand in an XMM register, specified as the left-most (first) 
operand in the instruction syntax. 


xmm2—Double quadword (128-bit) operand in an XMM register, specified as the right-most 
(second) operand in the instruction syntax. 


xmm/mem64—Quadword (64-bit) operand in a 128-bit XMM register or memory. 
xmm/mem128—Double quadword (128-bit) operand in an XMM register or memory. 


xmm1/mem128—Double quadword (128-bit) operand in an XMM register or memory, specified as 
the left-most (first) operand in the instruction syntax. 


xmm2/mem128—Double quadword (128-bit) operand in an XMM register or memory, specified as 
the right-most (second) operand in the instruction syntax. 


ymm—Double octword (256-bit) operand in an YMM register. 


ymm1—Double octword (256-bit) operand in an YMM register, specified as the left-most (first) 
operand in the instruction syntax. 


ymm2—Double octword (256-bit) operand in an YMM register, specified as the right-most 
(second) operand in the instruction syntax. 


ymm/mem64—Quadword (64-bit) operand in a 256-bit YMM register or memory. 
ymm/mem128—Double quadword (128-bit) operand in an YMM register or memory. 


ymm1/mem256—Double octword (256-bit) operand in an YMM register or memory, specified as 
the left-most (first) operand in the instruction syntax. 
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ymm2/mem256—Double octword (256-bit) operand in an YMM register or memory, specified as 


the right-most (second) operand in the instruction syntax. 


2.5.2 Opcode Syntax 


In addition to the notation shown above in “Mnemonic Syntax” on page 53, the following notation 
indicates the size and type of operands in the syntax of an instruction opcode: 


/digit—Indicates that the ModRM byte specifies only one register or memory (r/m) operand. The 
digit is specified by the ModRM reg field and is used as an instruction-opcode extension. Valid 
digit values range from 0 to 7. 


/r—Indicates that the ModRM byte specifies both a register operand and a reg/mem (register or 
memory) operand. 


cb, cw, cd, cp—Specifies a code-offset value and possibly a new code-segment register value. The 
value following the opcode is either one byte (cb), two bytes (cw), four bytes (cd), or six bytes 
(cp). 

ib, iw, id, ig—Specifies an immediate-operand value. The opcode determines whether the value is 
signed or unsigned. The value following the opcode, ModRM, or SIB byte is either one byte (ib), 
two bytes (iw), or four bytes (id). Word and doubleword values start with the low-order byte. 


+rb, +rw, +rd, +rq—Specifies a register value that is added to the hexadecimal byte on the left, 
forming a one-byte opcode. The result is an instruction that operates on the register specified by 
the register code. Valid register-code values are shown in Table 2-2. 

m64—Specifies a quadword (64-bit) operand in memory. 


+i—Specifies an x87 floating-point stack operand, ST(i). The value is used only with x87 floating- 
point instructions. It is added to the hexadecimal byte on the left, forming a one-byte opcode. Valid 
values range from 0 to 7. 


Table 2-2. +rb, +rw, +rd, and +rq Register Value 


REX.B Specified Register 
1 Value 
Bit +rb +rw +rd +rq 
0 AL AX EAX RAX 
af CL CX ECX RCX 
2 DL DX EDX RDX 
0 3 BL BX EBX RBX 
or no REX 1 
Prefix 4 AH, SPL SP ESP RSP 
5 CH, BPL! BP EBP RBP 
6 DH, SIL? Sl ESI RSI 
7 BH, DIL? DI EDI RDI 
1. See “REX Prefix” on page 14. 
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Table 2-2. +rb, +rw, +rd, and +rq Register Value (continued) 


REX.B Specified Register 
Bit? value +rb +rw +rd +rq 
0 R8B R8W R8D R8 
a R9B ROW R9D RQ 
2 R10B R1OW R10D R10 
3 R11B R11W R11D Rit 
. 4 R12B R12W R12D R12 
5 R13B R13W R13D R13 
6 R14B R14W R14D R14 
7 R15B R1i5W R15D R15 
1. See “REX Prefix” on page 14. 


2.5.3 Pseudocode Definition 


Pseudocode examples are given for the actions of several complex instructions (for example, see 
“CALL (Near)” on page 130). The following definitions apply to all such pseudocode examples: 


TISIIIIIISIISIAIIISI ASIII IASI IIASA IIIA 
// Pseudo Code Definition 

TISIIIIIIIISISIIISI IAAI IASI IIASA IAAI ASIII IIIA IIIT ASIII IIASA IIIA ISIS 
// 

// Comments start with double slashes. 


// 

// ‘=' can mean "is", or assignment based on context 
// ‘==' is the equals comparison operator 

// 


TISIIIIIIIISISIIII ASIII ASIST III IIIA 
// Constants 
TISIIIIIISIISISIIIIIASIS ASIII STASI IIIA IIIT III ASIII IIASA IAAI IAAI IIIA IIS IIIS IST 


0 // numbers are in base-10 (decimal), unless followed by a suffix 
0000_0001b // a number in binary notation, underbars added for readability 
FFEQ_0000h // a number expressed in hexadecimal notation 


// in the following, '&&' is the logical AND operator. See "Logical Operators" 
// below. 

// veg[fld] identifies a field (one or more bits) within architected register 
// or within a sub-element of a larger data structure. A dot separates the 

// higher-level data structure name from the sub-element name. 


CS.desc = Code Segment descriptor // CS.desc has sub-elements: base, limit, attr 
SS.desc = Stack Segment descriptor // SS.desc has the same sub-elements 
CS.desc.base = base subfield of CS.desc 

CS = Code Segment Register 

Stack Segment Register 

CPL = Current Privilege Level (0 <= CPL <= 3) 

REAL_MODE = (CRO[PE] == 0) 


n 
n 
ol 
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PROTECTED_MODE = ((CRO[PE] == 1) && (RFLAGS[VM] == @)) 
VIRTUAL_MODE = ((CRO[PE] == 1) && (RFLAGS[VM] == 1)) 

LEGACY_MODE = (EFER[LMA] == 0) 

LONG_MODE = (EFER[LMA] == 1) 

64BIT_MODE = ((EFER[LMA]==1) && (CS_desc.attr[L] == 1) && (CS_desc.attr[D] == @)) 
COMPATIBILITY MODE = (EFER[LMA] == 1) && (CS_desc.attr[L] == 0) 

PAGING_ENABLED = (CRO[PG] == 1) 

ALIGNMENT_CHECK_ENABLED = ((CRO[AM] == 1) && (RFLAGS[AC] == 1) && (CPL == 3)) 


OPERAND_SIZE = 16, 32, or 64 // size, in bits, of an operand 

// OPERAND_SIZE depends on processor mode, the current code segment descriptor 
// default operand size [D], presence of the operand size override prefix (66h) 
// and, in 64-bit mode, the REX prefix. 

// NOTE: Specific instructions take 8-bit operands, but for these instructions, 
// operand size is fixed and the variable OPERAND_SIZE is not needed. 


ADDRESS_SIZE = 16, 32, or 64 // size, in bits, of the effective address for 
// memory reads. ADDRESS_SIZE depends processor mode, the current code segment 
// descriptor default operand size [D], and the presence of the address size 
// override prefix (67h) 


STACK_SIZE = 16, 32, or 64 // size, in bits of stack operation operand 
// STACK_SIZE depends on current code segment descriptor attribute D bit and 
// the Stack Segment descriptor attribute B bit. 


TISISITLSIISSSITSS ITS S IIASA S ITT T IIASA TITAS TITS TIATIITTIISTIISISIITISIITIIS TTT 
// Architected Registers 
TISSSIISSSITSSITSSITSS ITS S TITS IIASA STITT ITA TITS TAIT SISTA TTT TT 
// Identified using abbreviated names assigned by the Architecture; can represent 
// the register or its contents depending on context. 

RAX = the 64-bit contents of the general-purpose register 

EAX = 32-bit contents of GPR EAX 


AX = 16-bit contents of GPR AX 
AL = lower 8 bits of GPR AX 
AH = upper 8 bits of GPR AX 


index_of(reg) = value used to encode the register. 
index_of(AX) = 0000b 
index_of(RAX) = 0000b 


// in legacy and compatibility modes the msb of the index is fixed as 0 


TISIIIIIIIISIAIISIASI IIIA IIASA IIIT III IIIA IIA IIIA IASI IIIA IIIA IIIS IST 
// Defined Variables 
TISIIIIIIIISIAIIIIIAI ASIA IT AIA I IIIA IIAI III IAI II I AI I IIIA I IAI TAIIII AI IIII II ISISTT 


old_RIP RIP at the start of current instruction 
old_RSP = RSP at the start of current instruction 
old_RFLAGS = RFLAGS at the start of the instruction 
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old_CS = CS selector at the start of current instruction 

old_DS = DS selector at the start of current instruction 

old_ES = ES selector at the start of current instruction 

old_FS = FS selector at the start of current instruction 

old_GS = GS selector at the start of current instruction 

old_SS = SS selector at the start of current instruction 

RIP the current RIP register 


RSP the current RSP register 

RBP = the current RBP register 

RFLAGS = the current RFLAGS register 
next_RIP = RIP at start of next instruction 


CS.desc = the current CS descriptor, including the subfields: 
base limit attr 

SS.desc = the current SS descriptor, including the subfields: 
base limit attr 


SRC = the instruction’s source operand 


SRC1 = the instruction's first source operand 

SRC2 = the instruction's second source operand 

SRC3 = the instruction's third source operand 

IMM8 = 8-bit immediate encoded in the instruction 
IMM16 = 16-bit immediate encoded in the instruction 
IMM32 = 32-bit immediate encoded in the instruction 
IMM64 = 64-bit immediate encoded in the instruction 


DEST = instruction’s destination register 


temp_* // 64-bit temporary register 

temp_*_desc // temporary descriptor, with sub-elements: 
// if it points to a block of memory: base limit attr 
// if it’s a gate descriptor: offet segment attr 


NULL = 0000h // null selector is all zeros 


SISISIISSSITSSITSS IIS SITSS TITS TITAS TITAS ITT TT 
// Exceptions 

TISSSITTSITSSITSS IIS S ITS S IIASA TITAS TSI IIASA TITAS TI ATT ITTSITTSIITIIS TTT 
EXCEPTION [#GP(0)] // Signals an exception; error code in parenthesis 

EXCEPTION [#UD] // if no error code 


// possible exception types: 

#DE // Divide-By-Zero-Error Exception (Vector 0) 
#DB // Debug Exception (Vector 1) 

#BP // INT3 Breakpoint Exception (Vector 3) 

#0F // INTO Overflow Exception (Vector 4) 

#BR // Bound-Range Exception (Vector 5) 

#UD // Invalid-Opcode Exception (Vector 6) 

#NM // Device-Not-Available Exception (Vector 7) 
#DF // Double-Fault Exception (Vector 8) 

#TS // Invalid-TSS Exception (Vector 10) 
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#NP // Segment-Not-Present Exception (Vector 11) 

#SS // Stack Exception (Vector 12) 

#GP // General-Protection Exception (Vector 13) 

#PF // Page-Fault Exception (Vector 14) 

#MF // x87 Floating-Point Exception-Pending (Vector 16) 
#AC // Alignment-Check Exception (Vector 17) 

#MC // Machine-Check Exception (Vector 18) 

#XF // SIMD Floating-Point Exception (Vector 19) 


SISISITSSITSSIILS ITS S ITS S TITAS TTI S TIA TSIT ATI 
// Implicit Assignments 
TITSSITLSITSSITLS ITS SITT STITT IIASA TITAS TITS TISTSIIS TT IA TT ITTSIISTSI TTS IS TTT 


// N,Z,A,S are integer variables, assigned a value when an instruction begins 
// executing (they can be assigned a different value in the middle of an 

// instruction, if needed) 

IF (OPERAND_SIZE == 16) 
IF (OPERAND_SIZE == 32) 
IF (OPERAND_SIZE == 64) 
IF (OPERAND_SIZE == 16) 
IF (OPERAND_SIZE == 32) 
IF (OPERAND_SIZE == 64) 
IF (ADDRESS_SIZE == 16) 
IF (ADDRESS_SIZE == 32) 
IF (ADDRESS_SIZE == 64) 
IF (STACK_SIZE == 16) 
IF (STACK_SIZE == 32) 
IF (STACK_SIZE == 64) 


NNNFFEPNNN< << 
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SISSSISTSITSSSIISS ITS S ITA S TITS SITS TITAS TTT ASTI ATTA I TTS 
// Bit Range Inside a Register 
TISTSITSSITSSITSS ITS S ITS TITS II ASIII ASIII ATTA IS TTI TT 


temp_data[x:y] // Bits x through y (inclusive) of temp_data 
TISISIITSIISSIASS ITS S ITS S IIASA TTI S TITS TI AT II AS IISTSTIIASTIATSSIITI TTT SI STS T TT 


// Nariables and data types 
TISISIISSITLSITLS ITS S ITS SII TSAI IIASA TITAS ATTA T ITT TT 


NxtValue = 5 //default data type is unsigned int. 

int //abstract data type representing an integer 

bool //abstract data type; either TRUE or FALSE 

vector //An array of data elements. Individual elements are accessed via 
//an unsigned integer zero-based index. Elements have a data type. 

bit //a single bit 

byte //8-bit value 

word //16-bit value 

doubleword //32-bit value 

quadword //64-bit value 

octword //128-bit value 


double octword //256-bit value 


60 AM D Publi c Us @] Instruction Overview 


AMDi¢1 


24594—Rev. 3.33—November 2021 AMD64 Technology 


unsigned int aval //treat aval as an unsigned integer value 


signed int valx //treat valx as a signed integer value 
bit vector b_vect //b_vect is an array of data elements. Each element is a bit. 
b_vect[5] //The sixth element (bit) in the array. Indices are 0-based. 


TISISISSSITSSITSS ITS S IT SS TITS T ITS S TITAS ITS TITS TISTSIISTTII TT IIATSSII TS TISIITS TTT 
// Elements Within a packed data type 
TISISISSSIISSITSS ITS S ITS S IIASA TITAS TITAS TA ST I ATT IATII TT II TTT IS TT ITISIIISI TST T TT 


// element i of size w occupies bits [wi-1:wi] 


TISISITLSIISSSIISSITSS TTS SIT ASIII TAIT TT 
// Moving Data From One Register To Another 
TISSSITSSITSSSITSS IIA S ITS SITS S IIASA STITT IIASA TTT T TTT 
temp_dest.b = temp_src; // 1-byte move (copies lower 8 bits of temp_src to 

// temp_dest, preserving the upper 56 bits of temp_dest) 
temp_dest.w = temp_src; // 2-byte move (copies lower 16 bits of temp_src to 

// temp_dest, preserving the upper 48 bits of temp_dest) 
temp_dest.d = temp_src; // 4-byte move (copies lower 32 bits of temp_src to 

// temp_dest; zeros out the upper 32 bits of temp_dest) 
temp_dest.q = temp_src; // 8-byte move (copies all 64 bits of temp_src to 

// temp_dest) 
temp_dest.v = temp_src; // 2-byte move if 

// 4-byte move if 

// 8-byte move if 
temp_dest.z = temp_src; // 2-byte move if 

// 4-byte move if 
temp_dest.a = temp_src; // 2-byte move if 

// 4-byte move if 

// 8-byte move if 
temp_dest.s = temp_src; // 2-byte move if 

// 4-byte move if 

// 8-byte move if 
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TISISITSTSITSSIITS IIS S ITS S TITS ASIII TTI ASIII IAT IIIT ISSA TTI TITS TST TT 
// Arithmetic Operators 
TISSSITSSITTSITSS ITS S ITS S IIASA TITAS TITS TA TT IA STI SSIIS TTI A TTI 


a+b // integer addition 

a -b // integer subtraction 

a* b // integer multiplication 

a/b // integer division. Result is the quotient 

a%b // modulo. Result is the remainder after a is divided by b 


// multiplication has precedence over addition where precedence is not explicitly 
// indicated by grouping terms with parentheses 


TISISITSSITSSIISS ITS S ITS SITS S IIA IIASA TITAS IIASA TTI I ITT TTT 
// Bitwise Operators 

SISSSITSSISSSIITS ITS S ITA S TITS TITAS ASIII IIS TTI ATTA IITT SITS SISSTT TT 
// temp, a, and b are values or register contents of the same size 

temp = a AND b; // Corresponding bits of a and b are logically ANDed together 
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temp = a OR b; // Corresponding bits of a and b are logically ORed together 
temp = a XOR b; // Each bit of temp is the exclusive OR of the corresponding 

// bits of a and b 
temp = NOT a; // Each bit of temp is the complement of the corresponding 


// bit of a 


// Concatenation 
value = {fieldi, field2,100b}; //pack values of field1, field2 and 100b 
size_of(value) = (size_of(field1) + size_of(field2) + 3) 


TISISIISSITSTSITSS ITS S ITT S TITS ITT IIASA TITAS TTI 
// Logical Shift Operators 
TISSSITSSITSSIISS ITS S ITS S TITS TITAS TITS IASI STITT IIASA IIIT TSI SIS TTT TT 


temp = a << b; // Result is a shifted left by _b_ bit positions. Zeros are 
// shifted into vacant positions. Bits shifted out are lost. 
temp = a >> b; // Result is a shifted right by _b_ bit positions. Zeros are 


// shifted into vacant positions. Bits shifted out are lost. 


TISISIISSITSSIISS ITS S ITS TITS TITAS TITAS ATTA T IATA 
// Logical Operators 

TISISITSSITSSSIISS ITS S ITS STITT IIASA STIS IIASA TTI TAIT TITS IITSIITSIST TTT 
// a boolean variable can assume one of two values (TRUE or FALSE) 

// In these examples, FOO, BAR, CONE, and HEAD have been defined to be boolean 

// variables 

FOO && BAR // Logical AND 

FOO || BAR // Logical OR 

!FOO // Logical complement (NOT) 


SISSSIISSITSSITSS IIS S IIASA SATA TITATTI TTT IAST ITT I STII IIT SIT TSI STS 
// Comparison Operators 

TISSSITLSITSSIISSITS SITS SITLL ITT IIASA TITAS TITS TIATIITTI IST AITTSII TS IISTSIIT TTT 
// a and b are integer values. The result is a boolean value. 


a == // if a and b are equal, the result is TRUE; otherwise it is FALSE. 

a!=b // if a and b are not equal, the result is TRUE; otherwise it is FALSE. 

a>b // if a is greater than b, the result is TRUE; otherwise it is FALSE. 

a<b // if a is less than b, the result is TRUE; otherwise it is FALSE. 

a >= b // if a is greater than or equal to b, the result is TRUE; otherwise 
// it is FALSE. 

a <=b // if a is less than or equal to b, the result is TRUE; otherwise 


// it is FALSE. 
SISSSITSSITSSIISSITSS ITS STITT S IIT T ITS S TITS ATI IIT TITAS TTT TT 
// Logical Expressions 
TISISIISSSIISSITLSITSSITTS ITALIA TITS TSI IIT TIT TIT TAIT TTT TT 
// Logical binary (two operand) and unary (one operand) operators can be combined 
// with comparison operators to form more complex expressions. Parentheses are 
// used to enclose comparison terms and to show precedence. If precedence is not 
// explicitly shown, logical AND has precedence over logical OR. Unary operators 
// have precedence over binary operators. 


FOO && (a < b) || !BAR // evaluate the comparison a < b first, then 
// AND this with FOO. Finally OR this intermediate result 
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// with the complement of BAR. 


// Logical expressions can be English phrases that can be evaluated to be TRUE 

// or FALSE. Statements assume knowledge of the system architecture (Volumes 1 and 
// 2). 

TISLSITTSITSSIISS ITS S TITS IIASA TITAS TITAS LTT TTI TTI TSI ITT SIS T TT 


IF (it is raining) 
close the window 


TISSSITSSIISSIISS ITS S ITS S ITLL ITT T TITS TITS ITA IITA SII IIIT IST T STS 
// Assignment Operators 
SISSSIISSSITSSSITSSITSS ITS S ITALIA TITAS TITAS IIASA SAIS IITSIISSIISTSIS TT T TT 
a=atnb // The value a is assigned the sum of the values a and b 


temp = R1 // The contents of the register temp is replaced by a copy of the 
// contents of register R11. 
RO += 2 // RO is assigned the sum of the contents of RO and the integer 2. 


R5 |= R6 // R5 is assigned the result of the bit-wise OR of the contents of R5 
// and R6. Contents of R6 is unchanged. 
R4 &= R7 // R4 is assigned the result of the bit-wise AND of the contents of 
// R4 and R7. Contents of R7 is unchanged. 
TISISISLSITSSITSS TTS S ITS S TITS TITAS TIAA TTI ASIII TITS I TTS TTT TT 
// TF-THEN-ELSE 
SISSSITSSITSSITLSITSS ITS S IIASA T IIASA TITAS TTT IAT TIS TTI ATT ITISIT TS I ST IIST TTT 
IF (FOO) <expression> // evaluation of <expression> is dependent on FOO 
// being TRUE. If FOO is FALSE, <expression> is not 
// evaluated. 


IF (FOO) 
<dependent expression1i> // scope of IF is indicated by indentation 


<dependent expressionx> 


IF (FOO) // If FOO is TRUE, <dependent expression> is 
// evaluated and the remaining ELSEIF and ELSE 
<dependent expression> // clauses are skipped. 
// 
ELSIF (BAR) // IF FOO is FALSE and BAR is TRUE, <alt expression> 
<alt expression> // is evaluated and the subsequent ELSEIF or ELSE 
// clauses are skipped. 
ELSE 
<default expressions> // evaluated if all the preceeding IF and ELSEIF 
// conditions are FALSE. 
IF ((FOO && BAR) || (CONE && HEAD)) // The condition can be an expression. 


<dependent expressions> 


TISSSIISLSITSSIALS ITS S ITS S TITS TTI TSAI TSI ASIII ASIII ATI T SIT TSITTITSTTT TT 
// Loops 
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TISSSITTSITSSSIISSITS SITS S IIASA TITAS TITAS ITA TITS AIA TST TTI TIISITIISTSIIITII TST T TT 
FOR i = <init_val> to <final_val>, BY <step> 
<expression> // scope of loop is indicated by indentation 
// if <step> = 1, may omit "BY" clause 


// nested loop example 


temp = 0 //initialize temp 
FOR i= 0 to 7 // i takes on the values © through 7 in succession 
temp += 1 // In the outer loop. Evaluated a total of 8 times. 
For j = 0 to 7, BY 2 // j takes on the values 0, 2, 4, and 6; but not 7. 
<inner-most exp> // This will be evaluated a total of 8 * 4 times. 


<next expression outside both loops> 
// C Language form of loop syntax is also allowed 


FOR (i = 0; i < MAX; it+) 
{ 


} 


TITSSIISLSITSSIISS IIS S ITS S TITS TITAS TITAS TTT TT 
// Functions 
TISISITLSITSSSIISS ITS S ITS S ITALIA TITAS TITAS ASTI TTI 
// Syntax for function definition 
<return data type> <function_name>(argument, ..) 

<expressions> 
RETURN <result> 


<expressions> //evaluated MAX times 


TISSSIISSSITSSIITSSITSS ITS S TITS STITT ASIII ASTI TTT 

// Built-in Functions 

TISISITSSITSSIISS ITS S ITS S IIASA ASIII TTI ASIII TIT TI SSIS TTS TST TT 

SignExtend(arg) // returns value of _arg_ sign extended to the width of the data 
// type of the function. Data type of function is inferred from 
// the context of the function's invocation. 


ZeroExtend(arg) // returns value of _arg_ zero extended to the width of the data 
// type of the function. Data type of function is inferred from 
// the context of the function's invocation. 


indexof (reg) //returns binary value used to encode reg specification 


TISISIILSITSSIISS ITA S ITT S TITS TITAS TITAS ATTA I ITT 
// READ_MEM 

// General memory read. This zero-extends the data to 64 bits and returns it. 
TISISITLSITSSIISS ITS S ITS STITT AIT ASTI ITAA TITS TIA TT IS TT ITISIIT TS I TTS T STS T TT 


usage: 
temp = READ_MEM.x [seg:offset ] // where xX is one of {v, z, b, w, d, q} 
// and denotes the size of the memory read 
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definition: 


IF ((seg AND OxFFFC) == NULL) 


// GP fault for using a null segment to reference memory 
EXCEPTION [#GP(0) ] 


IF ((seg==CS) || (seg==DS) || (Sseg==ES) || (seg==FS) || (Sseg==GS) ) 
// CS,DS,ES,FS,GS check for segment limit or canonical 


IF ((!64BIT_MODE) && (offset is outside seg’s limit) ) 
// #GP fault for segment limit violation in non-64-bit mode 
EXCEPTION [#GP(0) ] 


IF ((64BIT_MODE) && (offset is non-canonical) ) 
// #GP fault for non-canonical address in 64-bit mode 
EXCEPTION [#GP(0) ] 


ELSIF (seg==SS) // SS checks for segment limit or canonical 


IF ((!64BIT_MODE) && (offset is outside seg’s limit) ) 
// stack fault for segment limit violation in non-64-bit mode 
EXCEPTION [#SS(0) ] 


IF ((64BIT_MODE) && (offset is non-canonical) ) 
// stack fault for non-canonical address in 64-bit mode 
EXCEPTION [#SS(0) ] 


ELSE // ((seg==GDT) || (Seg==LDT) || (Seg==IDT) || (Seg==TSS)) 
// GDT,LDT,IDT,TSS check for segment limit and canonical 


IF (offset > seg.limit) 
// #GP fault for segment limit violation in all modes 
EXCEPTION [#GP(0) ] 


IF ((LONG_MODE) && (offset is non-canonical) ) 
EXCEPTION [#GP(0)] // #GP fault for non-canonical address in long mode 


IF ((ALIGNMENT_CHECK_ENABLED) && (offset misaligned, considering its 
size and alignment) ) 
EXCEPTION [#AC(0)] 


IF ((64_bit_mode) && ((seg==CS) || (seg==DS) || (Seg==ES) || (Seg==SS)) 
temp_linear = offset 

ELSE 
temp_linear = seg.base + offset 


IF ((PAGING_ENABLED) && (virtual-to-physical translation for temp_linear 
results in a page-protection violation) ) 
EXCEPTION [#PF(error_code)] // page fault for page-protection violation 
// (U/S violation, Reserved bit violation) 
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IF ((PAGING_ENABLED) && (temp_linear is on a not-present page) ) 
EXCEPTION [#PF(error_code)] // page fault for not-present page 


temp_data = memory [temp_linear].x // zero-extends the data to 64 
// bits, and saves it in temp_data 


RETURN (temp_data) // return the zero-extended data 


TISSSIITSITSSIILS ITA S ITT STITT IIASA STITT IISA TAIT TITS TI TIS IT TS ITTII STS 
// WRITE_MEM // General memory write 
TISSSISLSITSSITSS ITS S ITS S TITS TTI T TTT 


usage: 
WRITE_MEM.x [seg:offset] = temp.x // where <X> is one of these: 
// {V, Z, B, W, D, Q} and denotes the 
// size of the memory write 


definition: 


IF ((seg & OxFFFC)== NULL) // GP fault for using a null segment 
// to reference memory 
EXCEPTION [#GP(0) ] 


IF ((seg==CS) || (seg==DS) || (seg==ES) || (Sseg==FS) || (seg==GS) ) 
// CS,DS,ES,FS,GS check for segment limit or canonical 
IF ((!64BIT_MODE) && (offset is outside seg’s limit)) 
// #GP fault for segment limit violation in non-64-bit mode 
EXCEPTION [#GP(0) ] 
IF ((64BIT_MODE) && (offset is non-canonical) ) 
// #GP fault for non-canonical address in 64-bit mode 
EXCEPTION [#GP(0) ] 
ELSEIF (seg==SS) // SS checks for segment limit or canonical 
IF ((!64BIT_MODE) && (offset is outside seg’s limit)) 
// stack fault for segment limit violation in non-64-bit mode 
EXCEPTION [#SS(0) ] 
IF ((64BIT_MODE) && (offset is non-canonical) ) 
// stack fault for non-canonical address in 64-bit mode 
EXCEPTION [#SS(0) ] 
ELSE // ((seg==GDT) || (seg==LDT) || (Seg==IDT) || (Seg==TSS) ) 
// GDT,LDT,IDT,TSS check for segment limit and canonical 
IF (offset > seg.limit) 
// #GP fault for segment limit violation in all modes 
EXCEPTION [#GP(0) ] 
IF ((LONG_MODE) && (offset is non-canonical) ) 
// #GP fault for non-canonical address in long mode 
EXCEPTION [#GP(0) ] 


IF ((ALIGNMENT_CHECK_ENABLED) && (offset is misaligned, considering 
its size and alignment) ) 
EXCEPTION [#AC(0)] 
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IF ((64_bit_mode) && ((seg==CS) || (seg==DS) || (Seg==ES) || (Seg==SS)) 
temp_linear = offset 

ELSE 
temp_linear = seg.base + offset 


IF ((PAGING_ENABLED) && (the virtual-to-physical translation for 
temp_linear results in a page-protection violation) ) 


EXCEPTION [#PF(error_code) ] 
// page fault for page-protection violation 
// (U/S violation, Reserved bit violation) 


} 

IF ((PAGING_ENABLED) && (temp_linear is on a not-present page) ) 
EXCEPTION [#PF(error_code) ] // page fault for not-present page 

memory [temp_linear].x = temp.x // write the bytes to memory 


TISIIIIIIIISIAIIIIIAI ISSA ISIS TISAI IAAI IITA IIIA IIIS TISAI IIIA STAI IIIA IIIA ISIS 
// PUSH // Write data to the stack 
TISIIIIIIIISIAIIAIASI ASIA TAIIIII AIA T IAI I II I IIIS T AI I II AII II T AI ISI II I SI I IIIS ISTT 


usage: 
PUSH.x temp // where x is one of these: {v, z, b, w, d, q} and 
// denotes the size of the push 
definition: 
WRITE_MEM.x [SS:RSP.s - X] = temp.x // write to the stack 
RSP.sS = RSP - X // point RSP to the data just written 


TIITISILISISISISISII IIIT IIIA IAI 
// POP // Read data from the stack, zero-extend it to 64 bits 
TIIIISISISISISISIS IIIT IIIA IAI AIA 


usage: 
POP.x temp // where x is one of these: {v, z, b, w, d, gq} and 
// denotes the size of the pop 
definition: 
temp = READ_MEM.x [SS:RSP.s] // read from the stack 
RSP.s = RSP + X // point RSP above the data just read 


TISSSISTSITSSSIISS ITS S ITS S TITS TTA STATA TITAS IIT TIT IIATSTI IST IIATS TIT TSIIISITISTTTTT 
// READ_DESCRIPTOR // Read 8-byte descriptor from GDT/LDT, return the descriptor 
TISSSITSSITSSITSS ITS S ITS S IIASA STITT ITA TITS I LT IIT TT ISTII TTS SIT TTS ITSIIST TTT 
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usage: 
temp_descriptor = READ_DESCRIPTOR (selector, chktype) 
// chktype field is one of the following: 


// cs_chk used for far call and far jump 
// clg_chk used when reading CS for far call or far jump through call gate 
// ss_chk used when reading SS 


// iret_chk used when reading CS for IRET or RETF 
// intcs_chk used when readin the CS for interrupts and exceptions 


definition: 


temp_offset = selector AND Oxfff8 // upper 13 bits give an offset 
// in the descriptor table 


IF (selector.TI == 0) // read 8 bytes from the gdt, split it into 
// (base, limit,attr) if the type bits 
temp_desc = READ_MEM.q [gdt:temp_offset] 
// indicate a block of memory, or split 
// it into (segment, offset,attr) 
// if the type bits indicate 
// a gate, and save the result in temp_desc 
ELSE 
temp_desc = READ_MEM.q [ldt:temp_offset] 
// read 8 bytes from the LDT, split it into 
// (base, limit,attr) if the type bits 
// indicate a block of memory, or split 
// it into (segment,offset,attr) if the type 
// bits indicate a gate, and save the result 
// in temp_desc 


IF (selector.rpl or temp_desc.attr.dpl is illegal for the current mode/cpl) 
EXCEPTION [#GP(selector) ] 


IF (temp_desc.attr.type is illegal for the current mode/chktype) 
EXCEPTION [#GP(selector) ] 


IF (temp_desc.attr.p==0) 
EXCEPTION [#NP(selector) ] 


RETURN (temp_desc) 


SISISITTSITSSITLS ITS S ITT S TITS ITT TITAS I ATI TT 
// READ_IDT // Read an 8-byte descriptor from the IDT, return the descriptor 
TISISITLSITTSIIALSITISS IISA TAIT TI TT ITA TI IA TIT A TSI ATA IS TSIITT SII S TTT 


usage: 
temp_idt_desc = READ_IDT (vector) 
// "vector" is the interrupt vector number 
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definition: 
IF (LONG_MODE) // long-mode idt descriptors are 16 bytes long 


temp_offset = vector*16 
ELSE // (LEGACY_MODE) legacy-protected-mode idt descriptors are 8 bytes long 
temp_offset = vector*8 


// read 8 bytes from the idt, split it into 
// (segment,offset,attr), and save it in temp_desc 
temp_desc = READ_MEM.q [idt:temp_offset] 


IF (temp_desc.attr.dpl is illegal for the current mode/cpl) 
// exception, with error code that indicates this IDT gate 
EXCEPTION [#GP(vector*8+2) ] 


IF (temp_desc.attr.type is illegal for the current mode) 
// exception, with error code that indicates this IDT gate 
EXCEPTION [#GP(vector*8+2) ] 


IF (temp_desc.attr.p==0) 
// segment-not-present exception, with an error code that 
// indicates this IDT gate 
EXCEPTION [#NP(vector*8+2) ] 


RETURN (temp_desc) 


TISISIILSIISSIISS ITS S ITT S TITS TITAS TITS TIS TST ISTSII ATT I ATT I TTS SII TS ISTI TST T TT 
// READ_INNER_LEVEL_SP 

// Read a new stack pointer (RSP or SS:ESP) from the TSS 

TISSSITSSITSSIILS ITS SITSS TITS TITAS TITAS ASTI SII 


usage: 
temp_SS_desc:temp_RSP = READ_INNER_LEVEL_SP (new_cpl, ist_index) 


definition: 
IF (LONG_MODE) 


IF (ist_index>0) 
temp_RSP = READ_MEM.q [tss:ist_index*8+28] // read ISTn stack 
// pointer from the TSS 
ELSE // (ist_index==0) 
temp_RSP = READ_MEM.q [tss:new_cpl1*8+4] // read RSPn stack 
// pointer from the TSS 


// in long mode, changing to lower cpl sets SS.sel to NULL+new_cpl 
temp_SS_desc.sel = NULL + new_cpl 


ELSE // (LEGACY_MODE) 
{ 
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temp_RSP = READ_MEM.d [tss:new_cpl1*8+4] // read ESPn from the TSS 
temp_sel = READ_MEM.d [tss:new_cp1*8+8] // read SSn from the TSS 


temp_SS_desc = READ_DESCRIPTOR (temp_sel, ss_chk) 
} 


return (temp_RSP:temp_SS_desc) 


SISISIISSSISSSITSS ITS S ITS S TITS TITAS AIA IIASTIT TT IASII TT AISTIITTSIISISTTI STS IS TTT 
// READ_BIT_ARRAY // Read 1 bit from a bit array in memory 
TISISIISSITTSITSS ITS S ITS S IIASA TITAS TSAI ASIII TIT ATTITTSTITISIIS TS IISTSIS TTT 


usage: 
temp_value = READ _BIT_ARRAY ([mem], bit_number) 


definition: 


temp_BYTE = READ_MEM.b [mem + (bit_number SHR 3) ] 
// read the byte containing the bit 


temp_BIT = temp_BYTE SHR (bit_number & 7) 
// shift the requested bit position into bit 0 


return (temp_BIT & 0x01) // return ‘0’ or ‘1’ 


TISSIIIIIIISIAIISIIAIIAS IASI ITAA IIIA IISA III III AIT IIIA III IIASA IIIS IIIS 
// Shadow Stack Functions 
TISIIIIIIIISISIISIAIIII IASI ITAA IAAI ASIII IIIA IITA IIIT III IIASA IIIS ISIS 


define SSTK_ENABLED 
define SSTK_USER_ENABLED 
define SSTK_SUPV_ENABLED 


(CR4.CET) && (CRO.PE) && (!EFLAGS.VM) 
SSTK_ENABLED && (CPL==3) && (U_CET.SH_STK_EN) 
SSTK_ENABLED && (CPL <3) && (S_CET.SH_STK_EN) 


bool ShadowStacksEnabled (privLevel) 

IF ( SSTK_ENABLED && 
(( privLevel == 3) && U_CET.SH_STK_EN) || 
(( privLevel < 3) && S_CET.SH_STK_EN)) 
RETURN (TRUE) 

ELSE 
RETURN (FALSE) 


TITISIISSIISSITISSITSS ITT S TITS IIASA IIIT IISA STITT IIIS IIIT ITS S ITT T SSS 
//  SSTK_READ_MEM // read shadow stack memory 

// Usage: temp = SSTK_READ_MEM.x [linear_addr] 

// where x is either d or q (4 or 8 bytes) 
TISLSISSSIISSIISSITS SITS S IIASA TITAS IITA TITS IIIS S ITS TSS 


IF (PAGING_ENABLED) && ( 

( the linear address maps to a not-present page ) 

( the linear address maps to a non-shadow stack page ) 
( the access is user-mode && 
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the linear address maps to a supervisor shadow stack page ) 
|| ( the access is supervisor-mode && 
the linear address maps to a user shadow stack page )) 
EXCEPTION [PF(error_code)] // page fault, with the SS (shadow stack) bit 
// set in error_code and the present and 
// protection violation bits as appropriate 
temp_data.x = memory [linear_addr].x 
RETURN (temp_data) 


TISTSIISSIISSIILSITSS ITS S TITS ITTIITSITTTIITSITTSITSSSTITSSTIIS STII TITS ITT STS 
//  SSTK_WRITE_MEM // write shadow stack memory 

// Usage: SSTK_WRITE_MEM.x [linear_addr] = temp.x 

// where x is either d or q (4 or 8 bytes) 
TISLSITSSITLSIISSITSS TITS IIASA IATA TS ITS T SSS 


IF (PAGING_ENABLED) && ( 

( the linear address maps to a not-present page ) 

( the linear address maps to a non-shadow stack page ) 

( the access is user-mode && 

the linear address maps to a supervisor shadow stack page ) 
|| ( the access is supervisor-mode && 
the linear address maps to a user shadow stack page )) 
EXCEPTION [PF(error_code)] // page fault, w/ the SS (shadow stack) bit 

// set in error_code and the present and 
// protection violation bits as appropriate 

memory [linear_addr].x = temp.x 


TISTSIITSITSSIISSITSS TITS IITA TITAS TITAS STITT STII SITS IIT S ITT S TSS 
// SET_SSTK_TOKEN_BUSY (new_SSP) 

// Checks shadow stack token and if valid set the token's busy bit 

// Usage: SET_SSTK_TOKEN_BUSY (new_SSP) 
TISTSIISSIISSSITSSITSS TITS TS IIASA SII TITAS IITA S IIS TSS 


IF (new_SSP[2:0] != 0) // new SSP must be 8-byte aligned 

EXCEPTION [#GP(0) ] 
// check shadow stack token and set busy 
bool FAULT = FALSE 
< start atomic section > 
temp_Token = SSTK_READ_MEM.q [new_SSP] // fetch token with locked read 
IF ((!64-bit mode) && (temp_token[63:32] != 0)) 

FAULT = TRUE // address in token must be <4GB 
// in legacy/compatibility mode 

IF ((temp_Token AND 0x01) != 0) 


FAULT = TRUE // token busy bit must be 0 
IF ((temp_Token AND ~0x01) != new_SSP) 

FAULT = TRUE // address in token must match new SSP 
IF (!FAULT) 


temp_Token = temp_Token OR 0x01 // if no faults, set token busy bit 
SSTK_WRITE_MEM.q [new_SSP] = temp_Token // write token and unlock 
< end atomic section > 
IF (FAULT) 
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EXCEPTION [#GP(0)] 
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3 General-Purpose Instruction Reference 


This chapter describes the function, mnemonic syntax, opcodes, affected flags, and possible 
exceptions generated by the general-purpose instructions. General-purpose instructions are used in 
basic software execution. Most of these instructions load, store, or operate on data located in the 
general-purpose registers (GPRs), in memory, or in both. The remaining instructions are used to alter 
the sequential flow of the program by branching to other locations within the program, or to entirely 
different programs. With the exception of the MOVD, MOVMSKPD and MOVMSKPS instructions, 
which operate on MMX/XMM registers, the instructions within the category of general-purpose 
instructions do not operate on any other register set. 


Most general-purpose instructions are supported in all hardware implementations of the AMD64 
architecture. However, some instructions in this group are optional and support must be determined by 
testing processor feature flags using the CPUID instruction. These instructions are listed in Table 3-1, 
along with the CPUID function, register and bit used to test for the presence of the instruction. 


Table 3-1. Instruction Support Indicated by CPUID Feature Bits 

Instruction CPUID Function(s) Register[Bit] Feature Flag 
ADCX, ADOX 0000_0007h (ECX=0) EBX[19] ADX 
OUR on nstuctions = | 9900_0007h (ECX=0) EBX(3] BMI1 
HOUR on nstuctions = | 9900_0007h (ECX=0) EBX(8] BMI2 
CLFLOPT 0000_0007_0 EBX[23] CLFLOPT 
CLWB 0000_0007h (ECX=0) EBX [24] CLWB 
CLZERO 8000_0008h EBX(0] CLZERO 
CMPXCHG8B 0000_0001h, 8000_0001h EDX{[8] CMPXCHG8B 
CMPXCHG16B 0000_0001h ECX([13] CMPXCHG16B 
CMOVec (Conditional Moves) | _0000_0001h, 8000_0001h EDX(15] CMOV 
CLFLUSH 0000_0001h EDX([19] CLFSH 
CRC32 0000_0001h ECX|20] SSE42 
LAHF, SAHF 8000_0001h ECX|0] LahfSahf 
LZCNT 8000_0001h ECX[5] ABM 
ee ede 8000_0001h EDX[29] LM 
MCOMMIT 8000_0008h EBX(6] MCOMMIT 
MFENCE, LFENCE 0000_0001h EDX|26] SSE2 
MONITORX, MWAITX 8000_0001h ECX[29] MONITORX 
MOVBE 0000_0001h ECX|22] MOVBE 
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Table 3-1. Instruction Support Indicated by CPUID Feature Bits (continued) 


Instruction CPUID Function(s) Register[Bit] Feature Flag 
0000_0001h, 8000_0001h EDX[23] MMX 
MOvbD?+ 
0000_0001h EDX[26] SSE2 
MOVNTI 0000_0001h EDX[26] SSE2 
POPCNT 0000_0001h ECX[23] POPCNT 
ECX[8] 3DNowPrefetch 
PREFETCH / 
PREEETCHW2 8000_0001h EDX[29] LM 
EDX[31] 3DNow 
RDFSBASE, RDGSBASE = 
WRESBASE, WRGSBASE 0000_0007h (ECX=0) EBX[0] FSGSBASE 
RDPRU 8000_0008h EBX[4] RDPRU 
RDRAND 0000_0001h ECX[30] RDRAND 
RDSEED 0000_0007h (ECX=0) EBX[18] RDSEED 
RDPID 0000_0007h (ECX=0) ECX[22] RDPID 
SFENCE 0000_0001h EDX[25] SSE 
Trailing Bit Manipulation 
instnictions 8000_0001h ECX[21] TBM 
Notes: 
1. The MOVD variant that moves values to or from MMxX registers is part of the MMX subset; the MOVD variant that 
moves data to or from XMM registers is part of the SSE2 subset. 
2. Instruction is supported if any one of the listed feature flags is set. 


For more information on using the CPUID instruction, see the reference page for the CPUID 
instruction on page 165. For a comprehensive list of all instruction support feature flags, see 
Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


The general-purpose instructions can be used in legacy mode or 64-bit long mode. Compilation of 
general-purpose programs for execution in 64-bit long mode offers three primary advantages: access 
to the eight extended, 64-bit general-purpose registers (for a register set consisting of GPRO—-GPR15), 
access to the 64-bit virtual address space, and access to the RIP-relative addressing mode. 


For further information about the general-purpose instructions and register resources, see: 


¢ “General-Purpose Programming” in Volume 1. 

« “Summary of Registers and Data Types” on page 38. 
¢ “Notation” on page 53. 

e “Instruction Prefixes” on page 5. 


e Appendix B, “General-Purpose Instructions in 64-Bit Mode.” In particular, see “General Rules for 
64-Bit Mode” on page 555. 
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AAA ASCII Adjust After Addition 


Adjusts the value in the AL register to an unpacked BCD value. Use the AAA instruction after using 
the ADD instruction to add two unpacked BCD numbers. 


The instruction is coded without explicit operands: 
AAA 


If the value in the lower nibble of AL is greater than 9 or the AF flag is set to 1, the instruction 
increments the AH register, adds 6 to the AL register, and sets the CF and AF flags to 1. Otherwise, it 
does not change the AH register and clears the CF and AF flags to 0. In either case, AAA clears bits 
7:4 of the AL register, leaving the correct decimal digit in bits 3:0. 


This instruction also makes it possible to add ASCII numbers without having to mask off the upper 
nibble ‘3’. 
MXCSR Flags Affected 


Using this instruction in 64-bit mode generates an invalid-opcode exception. 


Mnemonic Opcode Description 


Create an unpacked BCD number. 
ee ae (Invalid in 64-bit mode.) 


Related Instructions 
AAD, AAM, AAS 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
U U U M U M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Virtual 
8086 /Protected Cause of Exception 


This instruction was executed in 64-bit mode. 


Exceptions 


Exception 


Invalid opcode, 
#UD 
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AAD ASCII Adjust Before Division 


Converts two unpacked BCD digits in the AL (least significant) and AH (most significant) registers to 
a single binary value in the AL register. 


The instruction is coded without explicit operands: 
AAD 
The instruction performs the following operation on the contents of AL and AH using the formula: 
= ((10d * AH) + (AL)) 
After the conversion, AH is cleared to 00h. 


In most modern assemblers, the AAD instruction adjusts from base-10 values. However, by coding the 
instruction directly in binary, it can adjust from any base specified by the immediate byte value (ib) 
suffixed onto the D5h opcode. For example, code D508h for octal, D50Ah for decimal, and D50Ch for 
duodecimal (base 12). 


Using this instruction in 64-bit mode generates an invalid-opcode exception. 


Mnemonic Opcode Description 
Adjust two BCD digits in AL and AH. 
ae Be (Invalid in 64-bit mode.) 
(None) D5 ib Adjust two BCD digits to the immediate byte base. 


(Invalid in 64-bit mode.) 
Related Instructions 
AAA, AAM, AAS 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
U M M U M U 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 


Virtual a 
8086 Cause of Exception 


Invalid opcode, 
#UD 


This instruction was executed in 64-bit mode. 
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AAM ASCII Adjust After Multiply 


Converts the value in the AL register from binary to two unpacked BCD digits in the AH (most 
significant) and AL (least significant) registers. 
The instruction is coded without explicit operands: 

AAM 


The instruction performs the following operation on the contents of AL and AH using the formula: 


(AL/10d) 
(AL mod 10d) 


AH 
AL 


In most modern assemblers, the AAM instruction adjusts to base-10 values. However, by coding the 
instruction directly in binary, it can adjust to any base specified by the immediate byte value (ib) 
suffixed onto the D4h opcode. For example, code D408h for octal, D40Ah for decimal, and D40Ch for 
duodecimal (base 12). 


Using this instruction in 64-bit mode generates an invalid-opcode exception. 


Mnemonic Opcode Description 


Create a pair of unpacked BCD values in AH and AL. 
oe DaOR (Invalid in 64-bit mode.) 


Create a pair of unpacked values to the immediate byte 
(None) D4 ib base. 
(Invalid in 64-bit mode.) 


Related Instructions 
AAA, AAD, AAS 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
U M M U M U 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M. Unaffected flags are blank. Undefined 


flags are U. 
Exceptions 
Virtual) Protecte 
Exception 8086 Cause of Exception 
Divide by zero, #DE | X Xx Xx 8-bit immediate value was 0. 
en opcode, Xx This instruction was executed in 64-bit mode. | 
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AAS ASCIl Adjust After Subtraction 


Adjusts the value in the AL register to an unpacked BCD value. Use the AAS instruction after using 
the SUB instruction to subtract two unpacked BCD numbers. 


The instruction is coded without explicit operands: 
AAS 


If the value in AL is greater than 9 or the AF flag is set to 1, the instruction decrements the value in 
AH, subtracts 6 from the AL register, and sets the CF and AF flags to 1. Otherwise, it clears the CF and 
AF flags and the AH register is unchanged. In either case, the instruction clears bits 7:4 of the AL 
register, leaving the correct decimal digit in bits 3:0. 


Using this instruction in 64-bit mode generates an invalid-opcode exception. 


Mnemonic Opcode Description 


Create an unpacked BCD number from the contents of 
AAS 3F the AL register. 
(Invalid in 64-bit mode.) 


Related Instructions 
AAA, AAD, AAM 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
U U U M U M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 


Virtual ee 
8086 Cause of Exception 


ee opcode, 


This instruction was executed in 64-bit mode. 
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ADC Add with Carry 


Adds the carry flag (CF), the value in a register or memory location (first operand), and an immediate 
value or the value in a register or memory location (second operand), and stores the result in the first 
operand location. 


The instruction has two operands: 
ADC dest, src 


The instruction cannot add two memory operands. The CF flag indicates a pending carry from a 
previous addition operation. The instruction sign-extends an immediate value to the length of the 
destination register or memory location. 


This instruction evaluates the result for both signed and unsigned data types and sets the OF and CF 
flags to indicate a carry in a signed or unsigned result, respectively. It sets the SF flag to indicate the 
sign of a signed result. 


Use the ADC instruction after an ADD instruction as part of a multibyte or multiword addition. 


The forms of the ADC instruction that write to memory support the LOCK prefix. For details about the 
LOCK prefix, see “Lock Prefix” on page 11. 


Mnemonic Opcode Description 

ADC AL, imm8 14 ib Add imm8 to AL + CF. 

ADC AX, imm16 15 iw Add imm16 to AX + CF. 

ADC EAX, imm32 15 id Add imm32 to EAX + CF. 

ADC RAX, imm32 15 id Add sign-extended imm32 to RAX + CF. 
ADC reg/mem8, imm8& 80 /2 ib Add imm8 to reg/mem8 + CF. 

ADC reg/mem16, imm16 81 /2 iw Add imm16 to reg/mem16 + CF. 

ADC reg/mem32, imm32 81 /2 id Add imm32 to reg/mem32 + CF. 

ADC reg/mem64, imm32 81 /2 id Add sign-extended imm32 to reg/mem64 + CF. 
ADC reg/mem16, imm8& 83 /2 ib Add sign-extended imm8 to reg/mem16 + CF. 
ADC reg/mem32, imm8& 83 /2 ib Add sign-extended imm8 to reg/mem32 + CF. 
ADC reg/mem64, imm8& 83 /2 ib Add sign-extended imm8 to reg/mem64 + CF. 
ADC reg/mem8, reg8 10 /r Add reg8 to reg/mem8 + CF 

ADC reg/mem16, reg16 11 /r Add regi6 to reg/mem16 + CF. 

ADC reg/mem32, reg32 11 /r Add reg32 to reg/mem32 + CF. 

ADC reg/mem64, reg64 11 /r Add reg64 to reg/mem64 + CF. 

ADC reg8, reg/mem8& 12 /r Add reg/mem8 to reg8 + CF. 

ADC regi6, reg/mem16 13 /r Add reg/mem16 to reg16 + CF. 
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Mnemonic Opcode Description 
ADC reg32, reg/mem32 13 /r Add reg/mem32 to reg32 + CF. 
ADC reg64, reg/mem64 13 /r Add reg/mem64 to reg64 + CF. 
Related Instructions 
ADD, SBB, SUB 
rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF |) SF | ZF | AF | PF | CF 
M M M M M M 
21 ; 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual} Protecte 
Exception Real| 8086 d Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #35 . ‘ non-canonical. 
A memory address exceeded a data segment limit or was non- 
X xX 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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ADCX Unsigned ADD with Carry Flag 


Adds the value in a register (first operand) with a register or memory (second operand) and the carry 
flag, and stores the result in the first operand location. This instruction sets the CF based on the 
unsigned addition. This instruction is useful in multi-precision addition algorithms. 


This is an ADX instructions. Support for this instruction is indicated by CPUID 
Fn0000_0007_EBX[ADX]=1. 


Mnemonic Opcode Description 
ADCX reg32, reg/mem32 66 OF 38 F6 /r Unsigned add with carryflag 
ADCX reg64, reg/mem64 66 OF 38 F6 /r Unsigned add with carry flag. 


Related Instructions 
ADOX 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 


M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to O is M (modified). Unaffected flags are 
blank. Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or 
Stack, #SS 4 a . was non-canonical. 
Xx Xx Xx Amemory address exceeded a data segment limit or was 
non-canonical. 
General protection, #GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
A page fault resulted from the execution of the 
Page fault, #PF x Xx aEChOn, 
Apanimentcnade HAC Xx Xx An unaligned memory reference was performed while 
g , alignment checking was enabled. 
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Protected Cause of Exception 


Exception 


Xx Xx Xx Instruction not supported by CPUID 
Fn0000_0007_EBX[ADX] = 0. 


Invalid opcode, #UD 
Xx Xx Lock prefix (FOh) preceding opcode. 
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ADD Signed or Unsigned Add 


Adds the value in a register or memory location (first operand) and an immediate value or the value in 
a register or memory location (second operand), and stores the result in the first operand location. 
The instruction has two operands: 

ADD dest, src 


The instruction cannot add two memory operands. The instruction sign-extends an immediate value to 
the length of the destination register or memory operand. 


This instruction evaluates the result for both signed and unsigned data types and sets the OF and CF 
flags to indicate a carry in a signed or unsigned result, respectively. It sets the SF flag to indicate the 
sign of a signed result. 


The forms of the ADD instruction that write to memory support the LOCK prefix. For details about the 
LOCK prefix, see “Lock Prefix” on page 11. 


Mnemonic Opcode Description 

ADDAL, imm8 04 ib Add imm8 to AL. 

ADD AX, imm16 05 iw Add imm16 to AX. 

ADD EAX, imm32 05 id Add imm32 to EAX. 

ADD RAX, imm32 05 id Add sign-extended imm32 to RAX. 
ADD reg/mem8, imm8 80 /0 ib Add immé8 to reg/memé. 

ADD reg/mem16, imm16 81 /0 iw Add imm16 to reg/mem16 

ADD reg/mem32, imm32 81 /0 id Add imm32 to reg/mem32. 

ADD reg/mem64, imm32 81 /0 id Add sign-extended imm32 to reg/mem64. 
ADD reg/mem16, imm8& 83 /0 ib Add sign-extended imm8 to reg/mem16 
ADD reg/mem32, imm8& 83 /0 ib Add sign-extended imm8 to reg/mem32. 
ADD reg/mem64, imm8& 83 /0 ib Add sign-extended imm8 to reg/mem64. 
ADD reg/mem8, reg8 00 /r Add reg8 to reg/memé. 

ADD reg/mem16, reg16 01 /r Add reg16 to reg/mem16. 

ADD reg/mem32, reg32 01/r Add reg32 to reg/mem32. 

ADD reg/mem64, reg64 01 /r Add reg64 to reg/mem64. 

ADD reg8, reg/mem8& 02 /r Add reg/mem6 to reg8. 

ADD regi6, reg/mem16 03 /r Add reg/mem1i6 to reg16. 

ADD reg32, reg/mem32 03 /r Add reg/mem32 to reg32. 

ADD reg64, reg/mem64 03 /r Add reg/mem64 to reg64. 
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Related Instructions 
ADC, SBB, SUB 
rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M M M M M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual | Protecte 
Exception Real| 8086 d Cause of Exception 
Stack, #SS Xx Xx Xx A memory address exceeded the stack segment limit or was 


non-canonical. 


Xx Xx A memory address exceeded a data segment limit or was non- 


Xx 
General protection, canonical, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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ADOX Unsigned ADD with Overflow Flag 


Adds the value in a register (first operand) with a register or memory (second operand) and the 
overflow flag, and stores the result in the first operand location. This instruction sets the OF based on 
the unsigned addition and whether there is a carry out. This instruction is useful in multi-precision 
addition algorithms. 


This is an ADX instructions. Support for this instruction is indicated by CPUID 
Fn0000_0007_EBX[ADX]=1. 


Mnemonic Opcode Description 
ADOX reg32, reg/mem32 F3 OF 38 F6 /r Unsigned add with overflow flag 
ADOX reg64, reg/mem64 F3 OF 38 F6 /r Unsigned add with overflow flag. 


Related Instructions 
ADCX 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to O is M (modified). Unaffected flags are 
blank. Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or 
Stack, #5 : - : was non-canonical. 
Xx x Xx Amemory address exceeded a data segment limit or was 
non-canonical. 
General protection, #GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
A page fault resulted from the execution of the 
Page fault, #PF Xx Xx aetruactiOH. 
AM oarent check GAC Xx Xx An unaligned memory reference was performed while 
g , alignment checking was enabled. 


General-Purpose : 85 
Instruction Reference [AM D Pu blic Use] 


AMD¢\ 
AMD64 Technology 


24594—Rev. 3.33—November 2021 


Protected Cause of Exception 


Exception 


Xx Xx Xx Instruction not supported by CPUID 
Fn0000_0007_EBX[ADX] = 0. 


Invalid opcode, #UD 
Xx Xx Lock prefix (FOh) preceding opcode. 
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AND Logical AND 


Performs a bit-wise logical and operation on the value in a register or memory location (first operand) 
and an immediate value or the value in a register or memory location (second operand), and stores the 
result in the first operand location. Both operands cannot be memory locations. 
The instruction has two operands: 

AND dest, src 


The instruction sets each bit of the result to 1 if the corresponding bit of both operands is set; 
otherwise, it clears the bit to 0. The following table shows the truth table for the logical and operation: 


X Y XandyY 
0) 0) 0) 
0) 1 0) 
1 0) 0) 
i i i 


The forms of the AND instruction that write to memory support the LOCK prefix. For details about the 
LOCK prefix, see “Lock Prefix” on page 11. 


Mnemonic Opcode __ Description 


AND AL, imm8 24 ib and the contents of AL with an immediate 8-bit value and store 
the result in AL. 


AND AX, imm16 Pisa and the contents of AX with an immediate 16-bit value and store 
the result in AX. 


AND EAX, imm32 25 id and the contents of EAX with an immediate 32-bit value and 
store the result in EAX. 


AND RAX, imm32 25 id and the contents of RAX with a sign-extended immediate 32-bit 
value and store the result in RAX. 


AND reg/mem8, imm8& 80 /4 ib and the contents of reg/mem8 with imms. 

AND reg/mem16, imm16 81 /4 iw and the contents of reg/mem16 with imm16. 

AND reg/mem32, imm32 81 /4 id and the contents of reg/mem32 with imm32. 

AND reg/mem64, imm32 81 /4 id and the contents of reg/nem64 with sign-extended imm32. 


AND reg/mem16, imm8& 83 /4 ib and the contents of reg/mem16 with a sign-extended 8-bit value. 
AND reg/mem32, imm8& 83 /4 ib and the contents of reg/mem32 with a sign-extended 8-bit value. 
AND reg/mem64, immé& 83 /4 ib and the contents of reg/mem64 with a sign-extended 8-bit value. 
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Mnemonic 


AND reg/memé8, reg8 

AND reg/mem16, regi6 
AND reg/mem32, reg32 
AND reg/mem64, reg64 
AND reg8, reg/mem8 

AND reg16, reg/mem16 
AND reg32, reg/mem32 


AND reg64, reg/mem64 


Related Instructions 


Opcode 


20 /r 


21 /r 


21 /r 


21 /r 


22 /r 


23 /r 


23 /r 


23 /r 


TEST, OR, NOT, NEG, XOR 


24594—Rev. 3.33—November 2021 


Description 


and the contents of an 8-bit register or memory location with the 
contents of an 8-bit register. 


and the contents of a 16-bit register or memory location with the 
contents of a 16-bit register. 


and the contents of a 32-bit register or memory location with the 
contents of a 32-bit register. 


and the contents of a 64-bit register or memory location with the 
contents of a 64-bit register. 


and the contents of an 8-bit register with the contents of an 8-bit 
memory location or register. 


and the contents of a 16-bit register with the contents of a 16-bit 
memory location or register. 


and the contents of a 32-bit register with the contents of a 32-bit 
memory location or register. 


and the contents of a 64-bit register with the contents of a 64-bit 
memory location or register. 


rFLAGS Affected 

ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0) M;}|M |U }|M | 0 

21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 | 9 8 | 7 6 | 4 | 2] 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS ‘ an a non-canonix‘cal. 
Amemory address exceeded a data segment limit or was non- 
Xx X xX 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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ANDN Logical And-Not 


Performs a bit-wise logical and of the second source operand and the one's complement of the first 
source operand and stores the result into the destination operand. 


This instruction has three operands: 
ANDN dest, srci, src2 


In 64-bit mode, the operand size is determined by the value of VEX.W. If VEX.W is 1, the operand 
size is 64-bit; if VEX.W is 0, the operand size is 32-bit. In 32-bit mode, VEX.W is ignored. 16-bit 
operands are not supported. 


The destination operand (dest) is always a general purpose register. 


The first source operand (src1) is a general purpose register and the second source operand (src2) is 
either a general purpose register or a memory operand. 


This instruction implements the following operation: 


not tmp, srci 
and dest, tmp, src2 


The flags are set according to the result of the and pseudo-operation. 


The ANDN instruction is a BMI1 instruction. Support for this instruction is indicated by CPUID 
Fn0000_0007_EBX_x0[BMI1] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


VEX RXB.map_select W.vvvv.L.pp Opcode 
ANDN reg32, reg32, regimem32 C4 RXB.02 0.src1.0.00 F2 Ir 
ANDN reg64, reg64, reg/nmem64 C4 RXB.02 1.src1.0.00 F2 /r 


Related Instructions 


BEXTR, BLCI, BLCIC, BLCMSK, BLCS, BLSFILL, BLSI, BLSIC, BLSR, BLSMSK, BSF, BSR, 
LZCNT, POPCNT, TIMSKC, TZCNT, TZMSK 
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rFLAGS Affected 

ID | VIP | VIF ; AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0 M M U U 0 

21 | 20 | 19 | 18 | 17 | 16 |} 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
. Virtual ‘ 
Exception Real 80806 Protected Cause of Exception 
Xx Xx BMI instructions are only recognized in protected mode. 


BMI instructions are not supported as indicated by 


Invalid opcode, #UD Xx CPUID Fn0000_0007_EBX_x0[BMI] = 0. 
X VEX.L is 1. 
A memory address exceeded the stack segment limit or 
Stack, #5 7 was non-canonical. 
Xx A memory address exceeded a data segment limit or 
General protection, was non-canonical. 
#GP 
Xx A null data segment was used to reference memory. 
A page fault resulted from the execution of the 
Page fault, #PF Xx waSirlictGn. 
Alignment check, #AC Xx An unaligned memory reference was performed while 


alignment checking was enabled. 
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BEXTR Bit Field Extract 
(register form) 
Extracts a contiguous field of bits from the first source operand, as specified by the control field setting 
in the second source operand and puts the extracted field into the least significant bit positions of the 
destination. The remaining bits in the destination register are cleared to 0. 
This instruction has three operands: 

BEXTR dest, src, cntl 


In 64-bit mode, the operand size is determined by the value of VEX.W. If VEX.W is 1, the operand 
size is 64-bit; if VEX.W is 0, the operand size is 32-bit. In 32-bit mode, VEX.W is ignored. 16-bit 
operands are not supported. 


The destination (dest) is a general purpose register. 
The source operand (src) is either a general purpose register or a memory operand. 


The control (cntl) operand is a general purpose register that provides two fields describing the range of 
bits to extract: 


¢ Isb_index (in bits 7:0)—specifies the index of the least significant bit of the field 
e length (in bits 15:8)—-specifies the number of bits in the field. 


The position of the extracted field can be expressed as: 
[Isb_ index + length — 1] : [lsb_index] 


For example, if the Isb_index is 7 and length is 5, then bits 11:7 of the source will be copied to bits 4:0 
of the destination, with the rest of the destination being zero-filled. Zeros are provided for any bit 
positions in the specified range that lie beyond the most significant bit of the source operand. A length 
value of zero results in all zeros being written to the destination. 


This form of the BEXTR instruction is a BMI1 instruction. Support for this instruction is indicated by 
CPUID Fn0000_0007_EBX_x0[BMI1] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


VEX  RXB.map_select W.vvvv.L.pp §Opcode 
BEXTR reg32, regimem32, reg32 C4 RXB.02 O.cntl.0.00 F7 /r 
BEXTR reg64, reg/mem64, reg64 C4 RXB.02 1.cntl.0.00 F7 /r 
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Related Instructions 
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ANDN, BLCI, BLCIC, BLCMSK, BLCS, BLSFILL, BLSI, BLSIC, BLSR, BLSMSK, BSF, BSR, 
LZCNT, POPCNT, TIMSKC, TZCNT, TZMSK 


rFLAGS Affected 

ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0) U|M}]}UyJU iO 

21 | 20 | 19 | 18 | 17 | 16 | 14 iste 11 | 10 | 9 8 | 7 6 | 4 | 2] 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Mode 
Exception Virtual Cause of Exception 
Real| 8086 Protected 
Xx Xx BMI instructions are only recognized in protected mode. 
BMI instructions are not supported, as indicated by 
Invalid opcode, #UD X | CPUID Fn0000_0007_EBX_xO[BMI] = 0. 
X VEX.L is 1. 
A memory address exceeded the stack segment limit or 
Stack, #SS , was non-canonical. 
Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
Alignment check, #AC Xx An unaligned memory reference was performed while 


alignment checking was enabled. 
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BEXTR Bit Field Extract 
(immediate form) 
Extracts a contiguous field of bits from the first source operand, as specified by the control field setting 
in the second source operand and puts the extracted field into the least significant bit positions of the 
destination. The remaining bits in the destination register are cleared to 0. 
This instruction has three operands: 

BEXTR dest, src, cntl 


In 64-bit mode, the operand size is determined by the value of XOP.W. If XOP.W is 1, the operand size 
is 64-bit; if XOP.W is 0, the operand size is 32-bit. In 32-bit mode, XOP.W is ignored. 16-bit operands 
are not supported. 


The destination (dest) is a general purpose register. 
The source operand (src) is either a general purpose register or a memory operand. 


The control (cntl) operand is a 32-bit immediate value that provides two fields describing the range of 
bits to extract: 


¢ Isb_index (in immediate operand bits 7:0)—specifies the index of the least significant bit of the 
field 


¢ length (in immediate operand bits 15:8)—specifies the number of bits in the field. 
The position of the extracted field can be expressed as: 
[Isb_ index + length — 1] : [Isb_index] 


For example, if the Isb_index is 7 and length is 5, then bits 11:7 of the source will be copied to bits 4:0 
of the destination, with the rest of the destination being zero-filled. Zeros are provided for any bit 
positions in the specified range that lie beyond the most significant bit of the source operand. A length 
value of zero results in all zeros being written to the destination. 


This form of the BEXTR instruction is a TBM instruction. Support for this instruction is indicated by 
CPUID Fn8000_0001_ECX[TBM] =1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


XOP RXB.map_select W.vvvv.L.pp Opcode 
BEXTR reg32, regimem32, imm32 8F RXB.0A 0.1111.0.00 10 /r /id 
BEXTR reg64, reg/mem64, imm32 8F RXB.0A 1.1111.0.00 10 /r /id 
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ANDN, BLCI, BLCIC, BLCMSK, BLCS, BLSFILL, BLSI, BLSIC, BLSR, BLSMSK, BSF, BSR, 
LZCNT, POPCNT, TIMSKC, TZCNT, TZMSK 


rFLAGS Affected 

ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0) U|M}]}UyJU iO 

21 | 20 | 19 | 18 | 17 | 16 | 14 iste 11 | 10 | 9 8 | 7 6 | 4 | 2] 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real} 8086 [Protected Cause of Exception 
Xx Xx TBM instructions are only recognized in protected mode. 
: TBM instructions are not supported, as indicated by 
Invalid opcode, #UD X | CPUID Fn8000_0001_ ECX|TBM] = 0. 
Xx XOP.L is 1. 
A memory address exceeded the stack segment limit or 
Stack, #58 . was non-canonical. 
Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
An unaligned memory reference was performed while 
Alignment Cheek, #AC x alignment checking was enabled. 
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BLCFILL Fill From Lowest Clear Bit 


Finds the least significant zero bit in the source operand, clears all bits below that bit to 0 and writes 
the result to the destination. If there is no zero bit in the source operand, the destination is written with 
all zeros. 
This instruction has two operands: 

BLCFILL dest, src 


In 64-bit mode, the operand size is determined by the value of XOP.W. If XOP.W is 1, the operand size 
is 64-bit; if XOP.W is 0, the operand size is 32-bit. In 32-bit mode, XOP.W is ignored. 16-bit operands 
are not supported. 


The destination (dest) is a general purpose register. 
The source operand (src) is a general purpose register or a memory operand. 


The BLCFILL instruction effectively performs a bit-wise logical and of the source operand and the 
result of incrementing the source operand by 1 and stores the result to the destination register: 


add tmp, src, 1 
and dest, tmp, src 


The value of the carry flag of rFLAGS is generated according to the result of the add pseudo- 
instruction and the remaining arithmetic flags are generated by the and pseudo-instruction. 


The BLCFILL instruction is a TBM instruction. Support for this instruction is indicated by CPUID 
Fn8000_0001_ECX[TBM] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


XOP RXB.map_select W.vvvv.L.pp Opcode 
BLCFILL reg32, reg/mem32 8F RXB.09 0.dest.0.00 01/1 
BLCFILL reg64, reg/mem64 8F RXB.09 1.dest.0.00 01/1 


Related Instructions 


ANDN, BEXTR, BLCI, BLCIC, BLCMSK, BLCS, BLSFILL, BLSI, BLSIC, BLSR, BLSMSK, BSF, 
BSR, LZCNT, POPCNT, TIMSKC, TZCNT, TZMSK 
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rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0 M M U U M 
21 | 20 | 19 | 18 ; 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 
Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Exception vue Cause of Exception 
p Real| 8086 [Protected p 
Xx Xx TBM instructions are only recognized in protected mode. 
TBM instructions are not supported, as indicated by 
Invalid opcode, #UD X | CPUID Fn8000_0001_ECX{TBM] = 0. 
X XOP.L is 1. 
A memory address exceeded the stack segment limit or 
Stack, #SS ‘ was non-canonical. 
Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
An unaligned memory reference was performed while 
AIGMIMENE CHEEK FAG x alignment checking was enabled. 
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BLCI Isolate Lowest Clear Bit 
Finds the least significant zero bit in the source operand, sets all other bits to 1 and writes the result to 
the destination. If there is no zero bit in the source operand, the destination is written with all ones. 
This instruction has two operands: 

BLCI dest, src 


In 64-bit mode, the operand size is determined by the value of XOP.W. If XOP.W is 1, the operand size 
is 64-bit; if XOP.W is 0, the operand size is 32-bit. In 32-bit mode, XOP.W is ignored. 16-bit operands 
are not supported. 


The destination (dest) is a general purpose register. 
The source operand (src) is a general purpose register or a memory operand. 


The BLCI instruction effectively performs a bit-wise logical or of the source operand and the inverse 
of the result of incrementing the source operand by 1, and stores the result to the destination register: 


add tmp, src, 1 
not tmp, tmp 
or dest, tmp, src 


The value of the carry flag of rFLAGS is generated according to the result of the add pseudo- 
instruction and the remaining arithmetic flags are generated by the or pseudo-instruction. 


The BLCI instruction is a TBM instruction. Support for this instruction is indicated by CPUID 
Fn8000_0001_ECX[TBM] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


XOP RXB.map_select W.vvvv.L.pp Opcode 
BLCI reg32, reg/mem32 8F RXB.09 0.dest.0.00 02 /6 
BLCI reg64, reg/mem64 8F RXB.09 1.dest.0.00 02 /6 


Related Instructions 


ANDN, BEXTR, BLCFILL, BLCIC, BLCMSK, BLCS, BLSFILL, BLSI, BLSIC, BLSR, BLSMSK, 
BSF, BSR, LZCNT, POPCNT, TIMSKC, TZCNT, TZMSK 
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rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0 M M U U M 
21 | 20 | 19 | 18 ; 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 
Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Exception vue Cause of Exception 
p Real| 8086 [Protected p 
Xx Xx TBM instructions are only recognized in protected mode. 
TBM instructions are not supported, as indicated by 
Invalid opcode, #UD X | CPUID Fn8000_0001_ECX{TBM] = 0. 
X XOP.L is 1. 
A memory address exceeded the stack segment limit or 
Stack, #SS ‘ was non-canonical. 
Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
An unaligned memory reference was performed while 
AIGMIMENE CHEEK FAG x alignment checking was enabled. 
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BLCIC Isolate Lowest Clear Bit and Complement 
Finds the least significant zero bit in the source operand, sets that bit to 1, clears all other bits to 0 and 
writes the result to the destination. If there is no zero bit in the source operand, the destination is 
written with all zeros. 
This instruction has two operands: 

BLCIC dest, src 


In 64-bit mode, the operand size is determined by the value of XOP.W. If XOP.W is 1, the operand size 
is 64-bit; if XOP.W is 0, the operand size is 32-bit. In 32-bit mode, XOP.W is ignored. 16-bit operands 
are not supported. 


The destination (dest) is a general purpose register. 
The source operand (src) is a general purpose register or a memory operand. 


The BLCIC instruction effectively performs a bit-wise logical and of the negation of the source 
operand and the result of incrementing the source operand by 1, and stores the result to the destination 
register: 


add tmpi, src, 1 
not tmp2, src 
and dest, tmp1, tmp2 


The value of the carry flag of rFLAGS is generated according to the result of the add pseudo- 
instruction and the remaining arithmetic flags are generated by the and pseudo-instruction. 


The BLCIC instruction is a TBM instruction. Support for this instruction is indicated by CPUID 
Fn8000_0001_ECX[TBM] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 
Mnemonic Encoding 
XOP RXB.map_select W.vvvv.L.pp Opcode 
BLCIC reg32, reg/mem32 8F  RXB.09 O.dest.0.00 01/5 
BLCIC reg64, reg/mem64 8F  RXB.09 1.dest.0.00 01/5 


Related Instructions 


ANDN, BEXTR, BLCFILL, BLCI, BLCMSK, BLCS, BLSFILL, BLSI, BLSIC, BLSR, BLSMSK, 
BSF, BSR, LZCNT, POPCNT, TIMSKC, TZCNT, TZMSK 
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rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0 M M U U M 
21 | 20 | 19 | 18 ; 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 
Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Exception vue Cause of Exception 
p Real| 8086 [Protected p 
Xx Xx TBM instructions are only recognized in protected mode. 
TBM instructions are not supported, as indicated by 
Invalid opcode, #UD X | CPUID Fn8000_0001_ECX{TBM] = 0. 
X XOP.L is 1. 
A memory address exceeded the stack segment limit or 
Stack, #SS ‘ was non-canonical. 
Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
An unaligned memory reference was performed while 
AIGMIMENE CHEEK FAG x alignment checking was enabled. 
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BLCMSK Mask From Lowest Clear Bit 


Finds the least significant zero bit in the source operand, sets that bit to 1, clears all bits above that bit 
to 0 and writes the result to the destination. If there is no zero bit in the source operand, the destination 
is written with all ones. 


This instruction has two operands: 
BLCMSK dest, src 


In 64-bit mode, the operand size is determined by the value of XOP.W. If XOP.W is 1, the operand size 
is 64-bit; if XOP.W is 0, the operand size is 32-bit. In 32-bit mode, XOP.W is ignored. 16-bit operands 
are not supported. 


The destination (dest) is a general purpose register. 
The source operand (src) is a general purpose register or a memory operand. 


The BLCMSK instruction effectively performs a bit-wise logical xor of the source operand and the 
result of incrementing the source operand by 1 and stores the result to the destination register: 


add tmpi, src, 1 
xor dest, tmp1,src 


The value of the carry flag of rFLAGS is generated according to the result of the add pseudo- 
instruction and the remaining arithmetic flags are generated by the xor pseudo-instruction. 


If the input is all ones, the output is a value with all bits set to 1. 


The BLCMSK instruction is a TBM instruction. Support for this instruction is indicated by CPUID 
Fn8000_0001_ECX[TBM] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Instruction Encoding 
Mnemonic Encoding 


XOP RXB.map_select W.vvvv.L.pp Opcode 
BLCMSK reg32, reg/mem32 8F RXB.09 O.dest.0.00 02 /1 
BLCMSK reg64, reg/mem64 8F  RXB.09 1.dest.0.00 02 /1 


Related Instructions 


ANDN, BEXTR, BLCFILL, BLCI, BLCS, BLSFILL, BLSI, BLSIC, BLSR, BLSMSK, BSF, BSR, 
LZCNT, POPCNT, TIMSKC, TZCNT, TZMSK 
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rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0 M M U U M 
21 | 20 | 19 | 18 ; 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 
Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Exception vue Cause of Exception 
p Real| 8086 [Protected p 
Xx Xx TBM instructions are only recognized in protected mode. 
TBM instructions are not supported, as indicated by 
Invalid opcode, #UD X | CPUID Fn8000_0001_ECX{TBM] = 0. 
X XOP.L is 1. 
A memory address exceeded the stack segment limit or 
Stack, #SS ‘ was non-canonical. 
Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
An unaligned memory reference was performed while 
AIGMIMENE CHEEK FAG x alignment checking was enabled. 
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BLCS Set Lowest Clear Bit 


Finds the least significant zero bit in the source operand, sets that bit to 1 and writes the result to the 
destination. If there is no zero bit in the source operand, the source is copied to the destination (and CF 
in rFLAGS is set to 1). 


This instruction has two operands: 
BLCS dest, src 


In 64-bit mode, the operand size is determined by the value of XOP.W. If XOP.W is 1, the operand size 
is 64-bit; if XOP.W is 0, the operand size is 32-bit. In 32-bit mode, XOP.W is ignored. 16-bit operands 
are not supported. 


The destination (dest) is a general purpose register. 
The source operand (src) is a general purpose register or a memory operand. 


The BLCS instruction effectively performs a bit-wise logical or of the source operand and the result 
of incrementing the source operand by 1, and stores the result to the destination register: 


add tmp, src, 1 
or dest, tmp, src 


The value of the carry flag of rFLAGS is generated by the add pseudo-instruction and the remaining 
arithmetic flags are generated by the or pseudo-instruction. 


The BLCS instruction is a TBM instruction. Support for this instruction is indicated by CPUID 
Fn8000_0001_ECX[TBM] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


XOP RXB.map_select W.vvvv.L.pp Opcode 
BLCS reg32, reg/mem32 8F RXB.09 O.dest.0.00 01/3 
BLCS reg64, reg/mem64 8F RXB.09 1.dest.0.00 01/3 


Related Instructions 


ANDN, BEXTR, BLCFILL, BLCI, BLCIC, BLCMSK, BLSFILL, BLSI, BLSIC, BLSR, BLSMSK, 
BSF, BSR, LZCNT, POPCNT, TIMSKC, TZCNT, TZMSK 
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rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0 M M U U M 
21 | 20 | 19 | 18 ; 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 
Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Exception vue Cause of Exception 
p Real| 8086 [Protected p 
Xx Xx TBM instructions are only recognized in protected mode. 
TBM instructions are not supported, as indicated by 
Invalid opcode, #UD X | CPUID Fn8000_0001_ECX{TBM] = 0. 
X XOP.L is 1. 
A memory address exceeded the stack segment limit or 
Stack, #SS ‘ was non-canonical. 
Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
An unaligned memory reference was performed while 
AIGMIMENE CHEEK FAG x alignment checking was enabled. 
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BLSFILL Fill From Lowest Set Bit 


Finds the least significant one bit in the source operand, sets all bits below that bit to 1 and writes the 
result to the destination. If there is no one bit in the source operand, the destination is written with all 
ones. 
This instruction has two operands: 

BLSFILL dest, src 


In 64-bit mode, the operand size is determined by the value of XOP.W. If XOP.W is 1, the operand size 
is 64-bit; if XOP.W is 0, the operand size is 32-bit. In 32-bit mode, XOP.W is ignored. 16-bit operands 
are not supported. 


The destination (dest) is a general purpose register. 
The source operand (src) is a general purpose register or a memory operand. 


The BLSFILL instruction effectively performs a bit-wise logical or of the source operand and the 
result of subtracting 1 from the source operand, and stores the result to the destination register: 


sub tmp, src, 1 
or dest, tmp, src 


The value of the carry flag of rFLAGs is generated by the sub pseudo-instruction and the remaining 
arithmetic flags are generated by the or pseudo-instruction. 


The BLSFILL instruction is a TBM instruction. Support for this instruction is indicated by CPUID 
Fn8000_0001_ECX[TBM] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


XOP RXB.map_select W.vvvv.L.pp Opcode 
BLSFILL reg32, reg/mem32 8F RXB.09 0.dest.0.00 01 /2 
BLSFILL reg64, reg/nem64 8F RXB.09 1.dest.0.00 01 /2 


Related Instructions 


ANDN, BEXTR, BLCFILL, BLCI, BLCIC, BLCMSK, BLCS, BLSI, BLSIC, BLSR, BLSMSK, BSF, 
BSR, LZCNT, POPCNT, TIMSKC, TZCNT, TZMSK 
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rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0 M M U U M 
21 | 20 | 19 | 18 ; 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 
Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Exception vue Cause of Exception 
p Real| 8086 [Protected p 
Xx Xx TBM instructions are only recognized in protected mode. 
TBM instructions are not supported, as indicated by 
Invalid opcode, #UD X | CPUID Fn8000_0001_ECX{TBM] = 0. 
X XOP.L is 1. 
A memory address exceeded the stack segment limit or 
Stack, #SS ‘ was non-canonical. 
Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
An unaligned memory reference was performed while 
AIGMIMENE CHEEK FAG x alignment checking was enabled. 
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BLSI Isolate Lowest Set Bit 


Clears all bits in the source operand except for the least significant bit that is set to 1 and writes the 
result to the destination. If the source is all zeros, the destination is written with all zeros. 


This instruction has two operands: 
BLSI dest, src 


In 64-bit mode, the operand size is determined by the value of VEX.W. If VEX.W is 1, the operand 
size is 64-bit; if VEX.W is 0, the operand size is 32-bit. In 32-bit mode, VEX.W is ignored. 16-bit 
operands are not supported. 


The destination (dest) is a general purpose register. 
The source operand (src) is either a general purpose register or a bit memory operand. 


This instruction implements the following operation: 


neg tmp, srci 
and dst, tmp, srci 


The value of the carry flag is generated by the neg pseudo-instruction and the remaining status flags 
are generated by the and pseudo-instruction. 


The BLSI instruction is a BMI1 instruction. Support for this instruction is indicated by CPUID 
Fn0000_0007_EBX_x0[BMI1] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


VEX RXB.map_select W.vvvv.L.pp Opcode 
BLSI reg32, regimem32 C4  RXB.02 O.dest.0.00 F3/3 
BLSI reg64, reg/mem64 C4 RXB.02 1.dest.0.00 F3 /3 


Related Instructions 


ANDN, BEXTR, BLCI, BLCIC, BLCMSK, BLCS, BLSFILL, BLSIC, BLSR, BLSMSK, BSF, BSR, 
LZCNT, POPCNT, TIMSKC, TZCNT, TZMSK 
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rFLAGS Affected 

ID | VIP | VIF ; AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0 M M U U M 

21 | 20 | 19 | 18 | 17 | 16 |} 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Mode 
Exception Virtual Cause of Exception 
Real| 8086 /Protected 
Xx Xx BMI instructions are only recognized in protected mode. 
BMI instructions are not supported, as indicated by 
Invalid opcode, #UD X | CPUID Fn0000_0007_EBX_xO[BMI] = 0. 
X VEX.L is 1. 
A memory address exceeded the stack segment limit or 
Stack, #35 ‘ was non-canonical. 
Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 
Xx A null data segment was used to reference memory. 
A page fault resulted from the execution of the 
Page fault, #PF Xx ‘ciRIGHOn: 
An unaligned memory reference was performed while 
Alignment check, #AC . alignment checking was enabled. 
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BLSIC Isolate Lowest Set Bit and Complement 
Finds the least significant bit that is set to 1 in the source operand, clears that bit to 0, sets all other bits 
to 1 and writes the result to the destination. If there is no one bit in the source operand, the destination 
is written with all ones. 
This instruction has two operands: 

BLSIC dest, src 


In 64-bit mode, the operand size is determined by the value of XOP.W. If XOP.W is 1, the operand size 
is 64-bit; if XOP.W is 0, the operand size is 32-bit. In 32-bit mode, XOP.W is ignored. 16-bit operands 
are not supported. 


The destination (dest) is a general purpose register. 
The source operand (src) is a general purpose register or a memory operand. 


The BLSIC instruction effectively performs a bit-wise logical or of the inverse of the source operand 
and the result of subtracting 1 from the source operand, and stores the result to the destination register: 


sub tmp1, src, 1 
not tmp2, src 
or dest, tmp1, tmp2 


The value of the carry flag of rFLAGS is generated by the sub pseudo-instruction and the remaining 
arithmetic flags are generated by the or pseudo-instruction. 


The BLSR instruction is a TBM instruction. Support for this instruction is indicated by CPUID 
Fn8000_0001_ECX[TBM] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


XOP RXB.map_select W.vvvv.L.pp Opcode 
BLSIC reg32, reg/mem32 8F RXB.09 0.dest.0.00 01 /6 
BLSIC reg64, reg/mem64 8F RXB.09 1.dest.0.00 01 /6 


Related Instructions 


ANDN, BEXTR, BLCFILL, BLCI, BLCIC, BLCMSK, BLCS, BLSFILL, BLSI, BLSIC, BLSR, 
BLSMSK, BSF, BSR, LZCNT, POPCNT, TIMSKC, TZCNT, TZMSK 
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rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0 M M U U M 
21 | 20 | 19 | 18 ; 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 
Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Exception vue Cause of Exception 
p Real| 8086 Protected p 
Xx Xx TBM instructions are only recognized in protected mode. 
TBM instructions are not supported, as indicated by 
Invalid opcode, #UD X | CPUID Fn8000_0001_ECX|TBM] = 0. 
X XOP.L is 1. 
A memory address exceeded the stack segment limit or 
Stack, #SS A was non-canonical. 
Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
An unaligned memory reference was performed while 
AIGMIMENE CHEEK FAG x alignment checking was enabled. 
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BLSMSK Mask From Lowest Set Bit 


Forms a mask with bits set to 1 from bit 0 up to and including the least significant bit position that is set 
to 1 in the source operand and writes the mask to the destination. If the value of the source operand is 
zero, the destination is written with all ones. 


This instruction has two operands: 
BLSMSK dest, src 


In 64-bit mode, the operand size is determined by the value of VEX.W. If VEX.W is 1, the operand 
size is 64-bit; if VEX.W is 0, the operand size is 32-bit. In 32-bit mode, VEX.W is ignored. 16-bit 
operands are not supported. 


The destination (dest) is always a general purpose register. 


The source operand (src) is either a general purpose register or a memory operand and the destination 
operand (dest) is a general purpose register. 


This instruction implements the operation: 


sub tmp, srci, 1 
xor dst, tmp, srci 


The value of the carry flag is generated by the Sub pseudo-instruction and the remaining status flags 
are generated by the xor pseudo-instruction. 


If the input is zero, the output is a value with all bits set to 1. If this is considered a corner case input, 
software may test the carry flag to detect the zero input value. 


The BLSMSK instruction is a BMI1 instruction. Support for this instruction is indicated by CPUID 
Fn0000_0007_EBX_x0[BMI1] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 
Mnemonic Encoding 
VEX RXB.map_select W.vvvv.L.pp Opcode 
BLSMSK reg32, regimem32 C4  ~ RXB.02 O.dest.0.00 F3 /2 
BLSMSK reg64, reg/mem64 C4 ~~ RXB.02 1.dest.0.00 F3 /2 


Related Instructions 


ANDN, BEXTR, BLCI, BLCIC, BLCMSK, BLCS, BLSFILL, BLSI, BLSIC, BLSR, BSF, BSR, 
LZCNT, POPCNT, TIMSKC, TZCNT, TZMSK 
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rFLAGS Affected 

ID | VIP | VIF ; AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0 M M U U M 

21 | 20 | 19 | 18 | 17 | 16 |} 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Mode 
Exception Virtual Cause of Exception 
Real| 8086 Protected 
Xx Xx BMI instructions are only recognized in protected mode. 


BMI instructions are not supported, as indicated by 


Invalid opcode, #UD X | CPUID Fn0000_0007_EBX_xO[BMI] = 0. 

X VEX.L is 1. 

A memory address exceeded the stack segment limit or 

Stack, #35 . was non-canonical. 

Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 

Xx A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
Alignment check, #AC Xx An unaligned memory reference was performed while 


alignment checking was enabled. 
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BLSR Reset Lowest Set Bit 


Clears the least-significant bit that is set to 1 in the input operand and writes the modified operand to 
the destination. 


This instruction has two operands: 
BLSR dest, src 


In 64-bit mode, the operand size is determined by the value of VEX.W. If VEX.W is 1, the operand 
size is 64-bit; if VEX.W is 0, the operand size is 32-bit. In 32-bit mode, VEX.W is ignored. 16-bit 
operands are not supported. 


The destination (dest) is always a general purpose register. 
The source operand (src) is either a general purpose register or a memory operand. 


This instruction implements the operation: 


sub tmp, srci, 1 
and dst, tmp, srci 


The value of the carry flag is generated by the Sub pseudo-instruction and the remaining status flags 
are generated by the and pseudo-instruction. 


The BLSR instruction is a BMI1 instruction. Support for this instruction is indicated by CPUID 
Fn0000_0007_EBX_x0[BMI1] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


VEX RXB.map_select W.vvvv.L.pp Opcode 
BLSR reg32, reg/mem32 C4 RXB.02 0.dest.0.00 F3 /1 
BLSR reg64, reg/mem64 C4 RXB.02 1.dest.0.00 F3 /1 


Related Instructions 


ANDN, BEXTR, BLCI, BLCIC, BLCMSK, BLCS, BLSFILL, BLSI, BLSIC, BLSMSK, BSF, BSR, 
LZCNT, POPCNT, TIMSKC, TZCNT, TZMSK 
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rFLAGS Affected 

ID | VIP | VIF ; AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0 M M U U M 

21 | 20 | 19 | 18 | 17 | 16 |} 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Mode 
Exception Virtual Cause of Exception 
Real| 8086 Protected 
Xx Xx BMI instructions are only recognized in protected mode. 


BMI instructions are not supported, as indicated by 


Invalid opcode, #UD X | CPUID Fn0000_0007_EBX_xO[BMI] = 0. 

X VEX.L is 1. 

A memory address exceeded the stack segment limit or 

Stack, #35 . was non-canonical. 

Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 

Xx A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
Alignment check, #AC Xx An unaligned memory reference was performed while 


alignment checking was enabled. 
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BOUND Check Array Bound 


Checks whether an array index (first operand) is within the bounds of an array (second operand). The 
array index is a signed integer in the specified register. If the operand-size attribute is 16, the array 
operand is amemory location containing a pair of signed word-integers; if the operand-size attribute is 
32, the array operand is a pair of signed doubleword-integers. The first word or doubleword specifies 
the lower bound of the array and the second word or doubleword specifies the upper bound. 


The array index must be greater than or equal to the lower bound and less than or equal to the upper 
bound. If the index is not within the specified bounds, the processor generates a BOUND range- 
exceeded exception (#BR). 


The bounds of an array, consisting of two words or doublewords containing the lower and upper limits 
of the array, usually reside in a data structure just before the array itself, making the limits addressable 
through a constant offset from the beginning of the array. With the address of the array in a register, 
this practice reduces the number of bus cycles required to determine the effective address of the array 
bounds. 


Using this instruction in 64-bit mode generates an invalid-opcode exception. 


Mnemonic Opcode Description 


Test whether a 16-bit array index is within the bounds 
BOUND regi6, mem1i6&mem16 62 /r specified by the two 16-bit values in mem16&mem16. 
(Invalid in 64-bit mode.) 


Test whether a 32-bit array index is within the bounds 
BOUND reg32, mem32&mem32_ 62 /r specified by the two 32-bit values in mem32&mem3z2. 
(Invalid in 64-bit mode.) 


Related Instructions 


INT, INT3, INTO 


rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Bound range, #BR Xx x Xx The bound range was exceeded. 
Invalid opcode, Xx X Xx The source operand was a register. 
#UD Xx Instruction was executed in 64-bit mode. 
Stack, #SS Xx Xx Xx A memory address exceeded the stack segment limit 
General protection, | * X Xx Amemory address exceeded a data segment limit. 
#GP Xx A null data segment was used to reference memory. 
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Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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BSF Bit Scan Forward 


Searches the value in a register or a memory location (second operand) for the least-significant set bit. 
If a set bit is found, the instruction clears the zero flag (ZF) and stores the index of the least-significant 
set bit in a destination register (first operand). If the second operand contains 0, the instruction sets ZF 
to 1 and does not change the contents of the destination register. The bit index is an unsigned offset 
from bit 0 of the searched value. 


Mnemonic Opcode Description 
BSF reg16, reg/mem16 OF BC /r Bit scan forward on the contents of reg/mem16. 
BSF reg32, reg/mem32 OF BC /r Bit scan forward on the contents of reg/mem32. 
BSF reg64, reg/mem64 OF BC /r Bit scan forward on the contents of reg/mem64 


Related Instructions 


BSR 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
U U M U U U 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual} Protecte 
Exception Real| 8086 d Cause of Exception 
Stack, #SS Xx Xx Xx A memory address exceeded the stack segment limit or was 


non-canonical. 


Amemory address exceeded a data segment limit or was non- 


General protection, Xx Xx x canonical. 
#GP 

Xx A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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BSR Bit Scan Reverse 


Searches the value in a register or a memory location (second operand) for the most-significant set bit. 
If a set bit is found, the instruction clears the zero flag (ZF) and stores the index of the most-significant 
set bit in a destination register (first operand). If the second operand contains 0, the instruction sets ZF 
to 1 and does not change the contents of the destination register. The bit index is an unsigned offset 
from bit 0 of the searched value. 


Mnemonic Opcode Description 
BSR reg16, reg/mem16 OF BD /r Bit scan reverse on the contents of reg/mem16. 
BSR reg32, reg/mem32 OF BD /r Bit scan reverse on the contents of reg/mem32. 
BSR reg64, reg/mem64 OF BD /r Bit scan reverse on the contents of reg/mem64. 


Related Instructions 


BSF 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
U U M U U U 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Stack, #SS Xx Xx Xx A memory address exceeded the stack segment limit or was 


non-canonical. 


A memory address exceeded the data segment limit or was 


General protection, Xx Xx Xx non-canonical. 

ae Xx A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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BSWAP Byte Swap 


Reverses the byte order of the specified register. This action converts the contents of the register from 
little endian to big endian or vice versa. In a doubleword, bits 7:0 are exchanged with bits 31:24, and 
bits 15:8 are exchanged with bits 23:16. In a quadword, bits 7:0 are exchanged with bits 63:56, bits 
15:8 with bits 55:48, bits 23:16 with bits 47:40, and bits 31:24 with bits 39:32. A subsequent use of the 
BSWAP instruction with the same operand restores the original value of the operand. 


The result of applying the BSWAP instruction to a 16-bit register is undefined. To swap the bytes of a 
16-bit register, use the XCHG instruction and specify the respective byte halves of the 16-bit register 
as the two operands. For example, to swap the bytes of AX, use XCHG AL, AH. 


Mnemonic Opcode Description 
BSWAP reg32 OF C8 +rd Reverse the byte order of reg32. 
BSWAP reg64 OF C8 +rq Reverse the byte order of reg64. 


Related Instructions 


XCHG 


rFLAGS Affected 


None 


Exceptions 


None 
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BT Bit Test 


Copies a bit, specified by a bit index in a register or 8-bit immediate value (second operand), from a bit 
string (first operand), also called the bit base, to the carry flag (CF) of the rFLAGS register. 


If the bit base operand is a register, the instruction uses the modulo 16, 32, or 64 (depending on the 
operand size) of the bit index to select a bit in the register. 


If the bit base operand is a memory location, bit 0 of the byte at the specified address is the bit base of 
the bit string. If the bit index is in a register, the instruction selects a bit position relative to the bit base 
in the range ~2°3 to +2°° _ 1 if the operand size is 64, 29") 9 49°". if the operand size is 32, and 
~2'5 to +2!5 — 1 if the operand size is 16. If the bit index is in an immediate value, the bit selected is 
that value modulo 16, 32, or 64, depending on operand size. 


When the instruction attempts to copy a bit from memory, it accesses 2, 4, or 8 bytes starting from the 
specified memory address for 16-bit, 32-bit, or 64-bit operand sizes, respectively, using the following 
formula: 


Effective Address + (NumBytes, * (BitOffset DIV NumBits;»g)) 


When using this bit addressing mechanism, avoid referencing areas of memory close to address space 
holes, such as references to memory-mapped I/O registers. Instead, use a MOV instruction to load a 
register from such an address and use a register form of the BT instruction to manipulate the data. 


Mnemonic Opcode Description 
BT reg/mem16, reg16 OF A3 /r Copy the value of the selected bit to the carry flag. 
BT reg/mem32, reg32 OF A3 /r Copy the value of the selected bit to the carry flag. 
BT reg/mem64, reg64 OF A3 /r Copy the value of the selected bit to the carry flag. 
BT reg/mem16, imm8 OF BA/4 ib Copy the value of the selected bit to the carry flag. 
BT reg/mem32, imm8 OF BA/4 ib Copy the value of the selected bit to the carry flag. 
BT reg/mem64, imm8& OF BA/4 ib Copy the value of the selected bit to the carry flag. 


Related Instructions 


BTC, BTR, BTS 
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rFLAGS Affected 

ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
U U | U}UdslUtsTM 

21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 | 9 8 | 7 6 | 4 |2 ] 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS . ‘ . non-canonical. 
Xx Xx Xx Amemory address exceeded a data segment limit or was non- 
General protection, canonical. 
#GP 
Xx A null data segment was used to reference memory. 
Page fault, #PF x x A page fault resulted from the execution of the instruction. 
Alignment check, x Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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BTC Bit Test and Complement 


Copies a bit, specified by a bit index in a register or 8-bit immediate value (second operand), from a bit 
string (first operand), also called the bit base, to the carry flag (CF) of the rFLAGS register, and then 
complements (toggles) the bit in the bit string. 


If the bit base operand is a register, the instruction uses the modulo 16, 32, or 64 (depending on the 
operand size) of the bit index to select a bit in the register. 


If the bit base operand is a memory location, bit 0 of the byte at the specified address is the bit base of 
the bit string. If the bit index is in a register, the instruction selects a bit position relative to the bit base 
in the range ~2°3 to +2°° — 1 if the operand size is 64, aM 9:49") = 1, if the operand size is 32, and 
~2'5 to +2!5 — 1 if the operand size is 16. If the bit index is in an immediate value, the bit selected is 
that value modulo 16, 32, or 64, depending the operand size. 


This instruction is useful for implementing semaphores in concurrent operating systems. Such an 
application should precede this instruction with the LOCK prefix. For details about the LOCK prefix, 
see “Lock Prefix” on page 11. 


Mnemonic Opcode Description 


Copy the value of the selected bit to the carry flag, then 


BTC reginem 20, 1718 PE SE complement the selected bit. 

Copy the value of the selected bit to the carry flag, then 
Bit reg/mema2, reg32 ee eE it complement the selected bit. 
BTC reg/mem64, reg64 OF BB /r Copy the value of the selected bit to the carry flag, then 


complement the selected bit. 


Copy the value of the selected bit to the carry flag, then 


BIC reg/mem16, imine DEBE complement the selected bit. 

: ; Copy the value of the selected bit to the carry flag, then 
BIC regimemsz, immg OP BEIT 10 complement the selected bit. 
BTC reg/mem64, imm8 OF BA/7 ib Copy the value of the selected bit to the carry flag, then 


complement the selected bit. 


Related Instructions 


BT, BTR, BTS 
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rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
U U U U U M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 

A memory address exceeded the stack segment limit or was 
Stack, #SS . ‘ : non-canonical. 

Amemory address exceeded a data segment limit or was non- 

X X X : 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
x A null data segment was used to reference memory. 

Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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BTR Bit Test and Reset 


Copies a bit, specified by a bit index in a register or 8-bit immediate value (second operand), from a bit 
string (first operand), also called the bit base, to the carry flag (CF) of the rFLAGS register, and then 
clears the bit in the bit string to 0. 


If the bit base operand is a register, the instruction uses the modulo 16, 32, or 64 (depending on the 
operand size) of the bit index to select a bit in the register. 


If the bit base operand is a memory location, bit 0 of the byte at the specified address is the bit base of 
the bit string. If the bit index is in a register, the instruction selects a bit position relative to the bit base 
in the range ~2°3 to +2°9 — 1 if the operand size is 64, Pl o:49°* = 1, if the operand size is 32, and 
~2'5 to +2!5 — 1 if the operand size is 16. If the bit index is in an immediate value, the bit selected is 
that value modulo 16, 32, or 64, depending on the operand size. 


This instruction is useful for implementing semaphores in concurrent operating systems. Such 
applications should precede this instruction with the LOCK prefix. For details about the LOCK prefix, 
see “Lock Prefix” on page 11. 


Mnemonic Opcode Description 


Copy the value of the selected bit to the carry flag, then 


BTR reg/mem16, regi6 OF B3 /r laar the-sclectad bit. 

Copy the value of the selected bit to the carry flag, then 
BTR reg/mem32, reg32 OF B3 /r Alea ihe ecleciad bib 

Copy the value of the selected bit to the carry flag, then 
BTR reg/mem64, reg64 OF B3 /r clear the Gelaciad bil 


Copy the value of the selected bit to the carry flag, then 


BTR reg/mem16, imm8& OF BA/6 ib lear the-selectad bit. 

: ; Copy the value of the selected bit to the carry flag, then 
BTR reg/mem32, imm8 OF BA/6 ib elearihe-seleriad bil 

. ; Copy the value of the selected bit to the carry flag, then 
BTR reg/mem64, immé& OF BA/6 ib clear thea seleciad bil. 


Related Instructions 


BI, BTC, BTS 
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rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
U U U U U M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 

A memory address exceeded the stack segment limit or was 
Stack, #SS . - Zs non-canonical. 

Amemory address exceeded a data segment limit or was non- 

X X X 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 

Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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BTS Bit Test and Set 


Copies a bit, specified by bit index in a register or 8-bit immediate value (second operand), from a bit 
string (first operand), also called the bit base, to the carry flag (CF) of the rFLAGS register, and then 
sets the bit in the bit string to 1. 


If the bit base operand is a register, the instruction uses the modulo 16, 32, or 64 (depending on the 
operand size) of the bit index to select a bit in the register. 


If the bit base operand is a memory location, bit 0 of the byte at the specified address is the bit base of 
the bit string. If the bit index is in a register, the instruction selects a bit position relative to the bit base 
in the range ~2°3 to +2°9 — 1 if the operand size is 64, Pl o:49°* = 1, if the operand size is 32, and 
~2'5 to +2!5 — 1 if the operand size is 16. If the bit index is in an immediate value, the bit selected is 
that value modulo 16, 32, or 64, depending on the operand size. 


This instruction is useful for implementing semaphores in concurrent operating systems. Such 
applications should precede this instruction with the LOCK prefix. For details about the LOCK prefix, 
see “Lock Prefix” on page 11. 


Mnemonic Opcode Description 


Copy the value of the selected bit to the carry flag, then 


BTS reg/mem16, reg16 OF AB /r eo} thececlaciad bil: 

Copy the value of the selected bit to the carry flag, then 
BTS reg/mem32, reg32 OF AB /r Sat the scleciad bit. 

Copy the value of the selected bit to the carry flag, then 
BTS reg/mem64, reg64 OF AB /r eet the ecleciad bit 


OF BA/5 ib Copy the value of the selected bit to the carry flag, then 


BTS reg/mem16, immé set the selected bit. 


OF BA/5 ib Copy the value of the selected bit to the carry flag, then 


BTS reg/mem32, immé set the selected bit. 


OF BA/5 ib Copy the value of the selected bit to the carry flag, then 


BTS reg/mem64, immé set the selected bit. 


Related Instructions 


BT, BTC, BTR 
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rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
U U U U U M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 

A memory address exceeded the stack segment limit or was 
Stack, #SS . ‘ : non-canonical. 

Amemory address exceeded a data segment limit or was non- 

X X X : 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
x A null data segment was used to reference memory. 

Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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BZHI Zero High Bits 


Copies bits, left to right, from the first source operand starting with the bit position specified by the 
second source operand (index), writes these bits to the destination, and clears all the bits in positions 
greater than or equal to index. 


This instruction has three operands: 
BZHI dest, src, index 
In 64-bit mode, the operand size (op_size) is determined by the value of VEX.W. If VEX.W is 1, the 


operand size is 64 bits; if VEX.W is 0, the operand size is 32 bits. In 32-bit mode, VEX.W is ignored. 
16-bit operands are not supported. 


The destination (dest) is a general purpose register. The first source operand (src) is either a general 
purpose register or a memory operand. The second source operand is a general purpose register. Bits 
[7:0] of this register, treated as an unsigned 8-bit integer, specify the index of the most-significant bit 
of the first source operand to be copied to the corresponding bit of the destination. Bits [op_size- 
1:index] of the destination are cleared. 


If the value of index is greater than or equal to the operand size, index is set to (op_size-1). In this case, 
the CF flag is set. 


This instruction is a BMI2 instruction. Support for this instruction is indicated by CPUID 
Fn0000_0007_EBX_x0[BMI2] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


VEX RXB.map_select W.vvvv.L.pp Opcode 
BZHI reg32, regimem32, reg32 C4 RXB.02 0.index.0.00 F5 /r 
BZHI reg64, reg/mem64, reg64 C4 RXB.02 1.index.0.00 F5 /r 


Related Instructions 
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rFLAGS Affected 

ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0) Mi|M|U}|UJM 

21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 | 9 8 | 7 6 | 4 | 2] 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Mode 
Exception Virtual Cause of Exception 
Real| 8086 Protected 
Xx Xx BMIZ2 instructions are only recognized in protected mode. 
BMIZ2 instructions are not supported, as indicated by 
Invalid opcode, #UD X | CPUID Fn0000_0007_EBX_x0[BMI2] = 0. 
X VEX.L is 1. 
A memory address exceeded the stack segment limit or 
Stack, #35 . was non-canonical. 
Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
An unaligned memory reference was performed while 
Alignment check, #AC as alignment checking was enabled. 
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CALL (Near) Near Procedure Call 


Pushes the offset of the next instruction onto the stack and branches to the target address, which 
contains the first instruction of the called procedure. The target operand can specify a register, a 
memory location, or a label. A procedure accessed by a near CALL is located in the same code 
segment as the CALL instruction. 


If the CALL target is specified by a register or memory location, then a 16-, 32-, or 64-bit rIP is read 
from the operand, depending on the operand size. A 16- or 32-bit rIP is zero-extended to 64 bits. 


If the CALL target is specified by a displacement, the signed displacement is added to the rIP (of the 
following instruction), and the result is truncated to 16, 32, or 64 bits, depending on the operand size. 
The signed displacement is 16 or 32 bits, depending on the operand size. 


In all cases, the rIP of the instruction after the CALL is pushed on the stack, and the size of the stack 
push (16, 32, or 64 bits) depends on the operand size of the CALL instruction. 


For near calls in 64-bit mode, the operand size defaults to 64 bits. The E8 opcode results in 
RIP = RIP + 32-bit signed displacement and the FF /2 opcode results in RIP = 64-bit offset from 
register or memory. No prefix is available to encode a 32-bit operand size in 64-bit mode. 


At the end of the called procedure, RET is used to return control to the instruction following the 
original CALL. When RET is executed, the rIP is popped off the stack, which returns control to the 
instruction after the CALL. 


See CALL (Far) for information on far calls—calls to procedures located outside of the current code 
segment. For details about control-flow instructions, see “Control Transfers” in Volume 1, and 
“Control-Transfer Privilege Checks” in Volume 2. 


Mnemonic Opcode Description 


Near call with the target specified by a 16-bit relative 


CALL reli6off E8 iw displacement. 
. Near call with the target specified by a 32-bit relative 
CALL rel32off E8 id displacement. 
CALL reg/mem16 FF /2 Near call with the target specified by reg/mem16. 
Near call with the target specified by reg/mem32. (There 
CALE regimems2 Brie is no prefix for encoding this in 64-bit mode.) 
CALL reg/mem64 FF /2 Near call with the target specified by reg/mem64. 


For details about control-flow instructions, see “Control Transfers” in Volume 1, and “Control- 
Transfer Privilege Checks” in Volume 2. 
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Action 


// For function ShadowStacksEnabled( ) 
// see "Pseudocode Definition" on page 57 


CALLN_START: 


IF (OPCODE == 
temp_RIP = 
ELSE 
temp_RIP = 


IF (OPCODE = 
temp_RIP 


IF (stack is 


calln abs [mem] ) // CALLN, abs indirect 
READ_MEM.z [mem] 
// CALLN, rel/abs direct 
z-sized instruction offset field, zero-extended to 64 bits 


calln rel ) // if relative, add offset to rIP 
temp_RIP + RIP.v 


not large enough for a v-sized push) 


EXCEPTION[#SS(0) ] 


PUSH.v next_RIP 


IF ((64BIT_MODE) && (temp_RIP is non-canonical) || 
(!64BIT_MODE) && (temp_RIP > CS.limit)) 
EXCEPTION[#GP(0) ] 


IF ((ShadowStacksEnabled(current CPL)) && (OPCODE != calln +0)) 


{ 


IF (v == 2) 


{ 


// operand size = 16 


SSTK_WRITE_MEM.d [SSP-4] = next_IP 
SSP = SSP - 4 


ELSEIF (v == 4) // operand size = 32 


SSTK_WRITE_MEM.d [SSP-4] = next_EIP 
SSP = SSP - 4 


ELSE // (v 
{ 


== 8) // operand size = 64 


SSTK_WRITE_MEM.q [SSP-8] = next_RIP 
SSP = SSP - 8 


} // end shadow stacks enabled 


RIP = temp_RIP 


EXIT 


Related Instructions 


CALL(Far), RET(Near), RET(Far) 
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rFLAGS Affected 
None. 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS . a - non-canonical. 
Xx x Xx Amemory address exceeded a data segment limit or was non- 
canonical. 


General protection, The target offset exceeded the code segment limit or was non- 


ar . a - canonical. 

x A null data segment was used to reference memory. 
Alignment Check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
Page Fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
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CALL (Far) Far Procedure Call 


Pushes procedure linking information onto the stack and branches to the target address, which contains 
the first instruction of the called procedure. The operand specifies a target selector and offset. 


The instruction can specify the target directly, by including the far pointer in the immediate and 
displacement fields of the instruction, or indirectly, by referencing a far pointer in memory. In 64-bit 
mode, only indirect far calls are allowed; executing a direct far call (opcode 9A) generates an 
undefined opcode exception. For both direct and indirect far calls, if the CALL (Far) operand-size is 
16 bits, the instruction's operand is a 16-bit offset followed by a 16-bit selector. If the operand-size is 
32 or 64 bits, the operand is a 32-bit offset followed by a 16-bit selector. 


The target selector used by the instruction can be a code selector in all modes. Additionally, the target 
selector can reference a call gate in protected mode, or a task gate or TSS selector in legacy protected 
mode. 


¢ Target is a code selector—The CS:rIP of the next instruction is pushed to the stack, using operand- 
size stack pushes. Then code is executed from the target CS:rIP. In this case, the target offset can 
only be a 16- or 32-bit value, depending on operand-size, and is zero-extended to 64 bits. No CPL 
change is allowed. 


¢ Target is a call gate—The call gate specifies the actual target code segment and offset. Call gates 
allow calls to the same or more privileged code. If the target segment is at the same CPL as the 
current code segment, the CS:rIP of the next instruction is pushed to the stack. 


If the CALL (Far) changes privilege level, then a stack-switch occurs, using an inner-level stack 
pointer from the TSS. The CS:rIP of the next instruction is pushed to the new stack. If the mode is 
legacy mode and the param-count field in the call gate is non-zero, then up to 31 operands are 
copied from the caller's stack to the new stack. Finally, the caller's SS:rSP is pushed to the new 
stack. 


When calling through a call gate, the stack pushes are 16-, 32-, or 64-bits, depending on the size of 
the call gate. The size of the target rIP is also 16, 32, or 64 bits, depending on the size of the call 
gate. If the target rIP is less than 64 bits, it is zero-extended to 64 bits. Long mode only allows 64- 
bit call gates that must point to 64-bit code segments. 


¢ Target is a task gate or a TSS—If the mode is legacy protected mode, then a task switch occurs. 
See “Hardware Task-Management in Legacy Mode” in volume 2 for details about task switches. 
Hardware task switches are not supported in long mode. 


See CALL (Near) for information on near calls—calls to procedures located inside the current code 
segment. For details about control-flow instructions, see “Control Transfers” in Volume 1, and 
“Control-Transfer Privilege Checks” in Volume 2. 
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Mnemonic Opcode Description 
ean patone oned Contained in the instruction. (Invalid in 64-bit made) 
cee epate2 onSP Contained in the instruction. (Invalid in 64-bit mode) 
CALL FAR mem16:16 FF /3 fichiers with the target specified by a far pointer 
CALL FAR mem16:32 FF /3 ferent aes with the target specified by a far pointer 
Action 


// For functions READ_DESCRIPTOR, READ_INNER_LEVEL_SP, 
// ShadowStacksEnabled and SET_TOKEN_BUSY see "Pseudocode Definition" 
// on page 57 


CALLF_START: 


IF (REAL_MODE) 

CALLF_REAL_OR_VIRTUAL // CALLF real mode 
ELSEIF (PROTECTED_MODE ) 

CALLF_PROTECTED // CALLF protected mode 
ELSE // virtual mode 

CALLF_REAL_OR_VIRTUAL // CALLF virtual mode 


CALLF_REAL_OR_VIRTUAL: 


IF (OPCODE == callf [mem] ) // CALLF real mode, indirect 


{ 
temp_RIP = READ_MEM.z [mem] 
temp_CS = READ_MEM.w [mem+Z] 
} 
ELSE // CALLF real mode, direct 
{ 
temp_RIP = z-sized instruction offset field, zero-extended to 64 bits 
temp_CS = selector specified in the instruction 
Es 


PUSH.v old_CS 
PUSH.v next_RIP 


IF (temp_RIP > CS.limit) 
EXCEPTION [#GP(0) ] 


CS.sel = temp_CsS 
CS.base = temp_CS SHL 4 
RIP = temp_RIP 


EXIT // end CALLF real or virtual 
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CALLF_PROTECTED: 


IF (OPCODE == callf [mem]) // CALLF protected mode, indirect 


{ 
temp_offset = READ_MEM.z [mem] 
temp_sel = READ_MEM.w [mem+Z] 
ts 
ELSE // CALLF protected mode, direct 
{ 
IF (64BIT_MODE) 

EXCEPTION [#UD] // CALLF direct is illegal in 64-bit mode. 
temp_offset = z-sized instruction offset field, zero-extended to 64 bits 
temp_sel = selector specified in the instruction 
} 

temp_desc = READ_DESCRIPTOR (temp_sel, cs_chk) 
IF (temp_desc.attr.type == ’available_tss’ ) 
TASK_SWITCH // Using temp_sel as the target TSS 
ELSEIF (temp_desc.attr.type == ’taskgate’ ) 
TASK_SWITCH // Using the TSS selector in the task gate as the target TSS 
ELSEIF (temp_desc.attr.type == ’callgate’ ) 
CALLF_CALLGATE // CALLF through callgate 
ELSE // (temp_desc.attr.type == ‘code’ ) 
{ // the selector refers to a code descriptor 


temp_RIP = temp_offset // the target RIP is the instruction offset field 
CS = temp_desc 

PUSH.v old_CS 

PUSH.v next_RIP 


IF ((!64BIT_MODE) && (temp_RIP > CS.limit)) 
EXCEPTION [#GP(0)] // temp_RIP can't be non-canonical because its' a 
// 16- or 32-bit offset, zero-extended to 64 bits 
RIP = temp_RIP 


IF ShadowStacksEnabled at current CPL 
{ 
IF (v == 2) 
temp_LIP = CS.base + next_IP // operand size = 16 
ELSEIF (v == 4) 
temp_LIP = CS.base + next_EIP // operand size = 32 
ELSE // (v == 8) 
temp_LIP = next_RIP // operand size = 64 


IF EFER.LMA && (temp_desc.attr.L == 0) && (SSP[63:32] != 0) 
EXCEPTION [#GP(0) ] // SSP must be <4 GB 


Align SSP to 8B boundary, storing 4B of 0 if needed 
old_SSP = SSP 
SSTK_WRITE_MEM.q [SSP-16] = old_CS // push CS, LIP, SSP 
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SSTK_WRITE_MEM.q [SSP-8] 
SSTK_WRITE_MEM.q [SSP] 
SSP = SSP - 24 

J 


EXIT 
} // end CALLF selector=code segment 


temp_LIP // onto the shadow stack 
old_SSP 


CALLF_CALLGATE: 


IF (LONG_MODE) // the gate size controls the size of the stack pushes 
v=8-byte // Long mode only uses 64-bit call gates, force 8-byte opsize 


ELSEIF (temp_desc.attr.type == 'callgate32') 
v=4-byte // Legacy mode, using a 32-bit call-gate, force 4-byte 
ELSE // (temp_desc.attr.type == 'callgate16') 


v=2-byte // Legacy mode, using a 16-bit call-gate, force 2-byte opsize 


// the target CS and RIP both come from the call gate. 
temp_RIP = temp_desc.offset 


IF (LONG_MODE) 
{ // read 2nd half of 16-byte call-gate 
temp_upper = READ_MEM.q [temp_sel+8] // to get upper 32 bits of target RIP 
IF (temp_upper's extended attribute bits != 0) 
EXCEPTION [#GP(temp_sel) ] 
temp_RIP = tempRIP + (temp_upper SHL 32) // Concatenate both halves of RIP 


} 


CS = READ_DESCRIPTOR (temp_desc.segment, callgate_check) 


IF ((64BIT_MODE) && (temp_RIP is non-canonical) || 
(!64BIT_MODE) && (temp_RIP > CS.limit)) 
EXCEPTION[#GP(0) ] 


IF (CS.attr.conforming == 1) 
temp_CPL = CPL 

ELSE 
temp_CPL = CS.attr.dpl 


IF (CPL == temp_CPL) // CALLF through gate, to same privilege 
{ 
PUSH.v old_CS 
PUSH.v next_RIP 
RIP = temp_RIP 


IF (ShadowStacksEnabled at current CPL) 
{ 
IF (v == 2) 
temp_LIP = CS.base + next_IP // operand size = 16 
ELSEIF (v == 4) 
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temp_LIP = CS.base + next_EIP // operand size = 32 
ELSE // (v == 8) 
temp_LIP = next_RIP // operand size = 64 


IF ((EFER.LMA && (temp_desc.attr.L == 0)) && (SSP[63:32] != 0)) 
EXCEPTION [#GP(0) ] // SSP must be <4 GB 

Align SSP to next 8B boundary, storing 4B of 0 if needed 
old_SSP = SSP 
SSTK_WRITE_MEM.q [SSP-24] = old_CS // push CS, LIP, SSP 
SSTK_WRITE_MEM.q [SSP-16] = temp_LIP // onto the shadow stack 
SSTK_WRITE_MEM.q [SSP-8] = old_SSP 
SSP = SSP - 24 
} // end shadow stacks enabled 

EXIT // end CALLF through gate, to same privilege 


ELSE // CALLF through gate, to more privilege 


old_CPL = CPL 

CPL = temp_CPL 

temp_ist = 0 // CALLF doesn't use IST pointers. 
temp_SS_desc:temp_RSP = READ_INNER_LEVEL_SP(CPL, temp_ist) 
RSP.q = temp_RSP 

SS = temp_SS_desc 


PUSH.v old_SS // #SS on this or next pushes use SS.sel as error code 
PUSH.v old_RSP 


IF (LEGACY_MODE) // Legacy-mode call gates have a param_count field 
temp_PARAM_COUNT = temp_desc.attr.param_count 
FOR (I=temp_PARAM_COUNT; I>0; I--) 


temp_DATA = READ_MEM.v [old_SS:(old_RSP+I*V) ] 
PUSH.v temp_DATA 
J 

PUSH.v old_CS 

PUSH.Vv next_RIP 

RIP = temp_RIP 


IF ((ShadowStacksEnabled at CPL=3) && (old_CPL == 3)) 
PL3_SSP = SSP 


IF (ShadowStacksEnabled at new CPL) 


{ 
old_SSP = SSP 
SSP = PLn_SSP // where n=new CPL 


SET_SSTK_TOKEN_BUSY(SSP) // check for valid token and set busy bit 


IF old_CPL != 3 
u 
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// push CS,LIP,SSP onto sstk 


SSTK_WRITE_MEM.q [SSP-24] 
SSTK_WRITE_MEM.q [SSP-16] 
SSTK_WRITE_MEM.q [SSP-8] 
SSP - 24 


SSP 


old_cs // push CS 
temp_LIP // LIP and 
old_SSP // SSP to the shadow stack 


} 
} // end shadow stacks enabled at new CPL 


EXIT 


} // end CALLF to more priv 


Related Instructions 


CALL (Near), RET (Near), RET (Far) 


rFLAGS Affected 


None, unless a task switch occurs, in which case all flags are modified. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Invalid opcode, X X X The far CALL indirect opcode (FF /3) had a register operand. 
#UD Xx The far CALL direct opcode (9A) was executed in 64-bit mode. 
Xx As part of a stack switch, the target stack segment selector or 
rSP in the TSS was beyond the TSS limit. 
Xx As part of a stack switch, the target stack segment selector in 
the TSS was a null selector. 
Xx As part of a stack switch, the target stack selector’s TI bit was 
set, but LDT selector was a null selector. 
As part of a stack switch, the target stack segment selector in 
Invalid TSS, #TS Xx the TSS was beyond the limit of the GDT or LDT descriptor 
(selector) table. 
Xx As part of a stack switch, the target stack segment selector in 
the TSS contained a RPL that was not equal to its DPL. 
As part of a stack switch, the target stack segment selector in 
Xx the TSS contained a DPL that was not equal to the CPL of the 
code segment selector. 
Xx As part of a stack switch, the target stack segment selector in 
the TSS was not a writable segment. 
Segment not 
present, #NP Xx hs cameras code segment, call gate, task gate, or TSS was 
(selector) P , 
Amemory address exceeded the stack segment limit or was 
Stack, #55 . - ‘s non-canonical, and no stack switch occurred. 
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Exception 


Stack, #SS 
(selector) 


Real 


Virtual 
8086 


Protected 


Xx 


AMD64 Technology 


Cause of Exception 


After a stack switch, a memory access exceeded the stack 
segment limit or was non-canonical. 


As part of a stack switch, the SS register was loaded with a 
non-null segment selector and the segment was marked not 
present. 


General protection, 
#GP 


Amemory address exceeded a data segment limit or was non- 
canonical. 


The target offset exceeded the code segment limit or was non- 
canonical. 


A null data segment was used to reference memory. 


General protection, 
#GP 
(selector) 


The target code segment selector was a null selector. 


x |x| KI] Xx 


A code, call gate, task gate, or TSS descriptor exceeded the 
descriptor table limit. 


Asegment selector’s TI bit was set but the LDT selector was a 
null selector. 


The segment descriptor specified by the instruction was not a 
code segment, task gate, call gate or available TSS in legacy 
mode, or not a 64-bit code segment or a 64-bit call gate in long 
mode. 


The RPL of the non-conforming code segment selector 
specified by the instruction was greater than the CPL, or its 
DPL was not equal to the CPL. 


The DPL of the conforming code segment descriptor specified 
by the instruction was greater than the CPL. 


The DPL of the callgate, taskgate, or TSS descriptor specified 
by the instruction was less than the CPL, or less than its own 
RPL. 


The segment selector specified by the call gate or task gate 
was a null selector. 


The segment descriptor specified by the call gate was not a 
code segment in legacy mode, or not a 64-bit code segment in 
long mode. 


The DPL of the segment descriptor specified by the call gate 
was greater than the CPL. 


The 64-bit call gate’s extended attribute bits were not zero. 


The TSS descriptor was found in the LDT. 


Page fault, #PF 


A page fault resulted from the execution of the instruction. 


Alignment check, 
#AC 


x | Ki) «| &) xX 


An unaligned memory reference was performed while 
alignment checking was enabled. 
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CBW Convert to Sign-Extended 
CWDE 
CDQE 


Copies the sign bit in the AL or eAX register to the upper bits of the rAX register. The effect of this 
instruction is to convert a signed byte, word, or doubleword in the AL or eAX register into a signed 
word, doubleword, or quadword in the rAX register. This action helps avoid overflow problems in 
signed number arithmetic. 


The CDQE mnemonic is meaningful only in 64-bit mode. 


Mnemonic Opcode Description 

CBW 98 Sign-extend AL into AX. 
CWDE 98 Sign-extend AX into EAX. 
CDQE 98 Sign-extend EAX into RAX. 


Related Instructions 
CWD, CDQ, CQO 
rFLAGS Affected 


None 


Exceptions 


None 
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CWD Convert to Sign-Extended 
CDQ 
CQO 


Copies the sign bit in the rAX register to all bits of the rDX register. The effect of this instruction is to 
convert a signed word, doubleword, or quadword in the rAX register into a signed doubleword, 
quadword, or double-quadword in the rDX:rAX registers. This action helps avoid overflow problems 
in signed number arithmetic. 


The CQO mnemonic is meaningful only in 64-bit mode. 


Mnemonic Opcode Description 

CWD 99 Sign-extend AX into DX:AX. 
CDQ 99 Sign-extend EAX into EDX:EAX. 
CQO 99 Sign-extend RAX into RDX:RAX. 


Related Instructions 
CBW, CWDE, CDQE 
rFLAGS Affected 
None 


Exceptions 


None 
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CLC Clear Carry Flag 
Clears the carry flag (CF) in the rFLAGS register to zero. 


Mnemonic Opcode Description 
CLC F8 Clear the carry flag (CF) to zero. 
Related Instructions 
STC, CMC 
rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 


None 
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CLD Clear Direction Flag 


Clears the direction flag (DF) in the rFLAGS register to zero. If the DF flag is 0, each iteration of a 
string instruction increments the data pointer (index registers rSI or rDI). If the DF flag is 1, the string 
instruction decrements the pointer. Use the CLD instruction before a string instruction to make the 
data pointer increment. 


Mnemonic Opcode Description 


CLD FC Clear the direction flag (DF) to zero. 
Related Instructions 
CMPSx, INSx, LODSx, MOVSx, OUTSx, SCASx, STD, STOSx 
rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 


None 
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CLFLUSH Cache Line Flush 


Flushes the cache line specified by the mem8 linear-address. The instruction checks all levels of the 
cache hierarchy—internal caches and external caches—and invalidates the cache line in every cache 
in which it is found. If a cache contains a dirty copy of the cache line (that is, the cache line is in the 
modified or owned MOESI state), the line is written back to memory before it is invalidated. The 
instruction sets the cache-line MOESI state to invalid. 


The instruction also checks the physical address corresponding to the linear-address operand against 
the processor’s write-combining buffers. If the write-combining buffer holds data intended for that 
physical address, the instruction writes the entire contents of the buffer to memory. This occurs even 
though the data is not cached in the cache hierarchy. In a multiprocessor system, the instruction checks 
the write-combining buffers only on the processor that executed the CLFLUSH instruction. 


On processors that do not support the CLFLUSHOPT instruction, (CPUID Fn 
0000_0007_EBX_x0[CLFLOPT]=0), the CLFLUSH instruction is weakly ordered with respect to 
other instructions that operate on memory. Speculative loads initiated by the processor, or specified 
explicitly using cache-prefetch instructions, can be reordered around a CLFLUSH instruction. Such 
reordering can invalidate a speculatively prefetched cache line, unintentionally defeating the prefetch 
operation. The only way to avoid this situation is to use the MFENCE instruction after the CLFLUSH 
instruction to force strong-ordering of the CLFLUSH instruction with respect to subsequent memory 
operations. The CLFLUSH instruction may also take effect on a cache line while stores from previous 
store instructions are still pending in the store buffer. To ensure that such stores are included in the 
cache line that is flushed, use an MFENCE instruction ahead of the CLFLUSH instruction. Such stores 
would otherwise cause the line to be re-cached and modified after the CLFLUSH completed. The 
LFENCE, SFENCE, and serializing instructions are not ordered with respect to CLFLUSH. 


On processors that support CLFLUSHOPT, (CPUID Fn 0000_0007_EBX_x0[CLFLOPT]=1), 
CLFLUSH is ordered with respect to locked operations, fence instructions, and CLFLUSHOPT, 
CLFLUSH and write instructions that touch the same cache line. CLFLUSH is not ordered with 
CLFLUSHOPT, CLFLUSH and write instructions to other cache lines. 


The CLFLUSH instruction behaves like a load instruction with respect to setting the page-table 
accessed and dirty bits. That is, it sets the page-table accessed bit to 1, but does not set the page-table 
dirty bit. 


The CLFLUSH instruction executes at any privilege level. CLFLUSH performs all the segmentation 
and paging checks that a 1-byte read would perform, except that it also allows references to execute- 
only segments. 


The CLFLUSH instruction is supported if the feature flag CPUID Fn0000_0001_EDX[CLFSH] is set. 
The 8-bit field CPUID Fn 0000_0001_EBX[CLFlush] returns the size of the cacheline in quadwords. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 
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Mnemonic Opcode Description 
CLFLUSH mem8s OF AE /7 flush cache line containing memé. 


Related Instructions 


INVD, WBINVD, CLFLUSHOPT, CLZERO 


rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception (vector) | Real | 8086 | Protected Cause of Exception 
CLFLUSH instruction is not supported, as indicated by 
lnveliropeedse IE: |) oe | 2 X | CPUID Fn0000_0001_EDX[CLFSH] = 0. 
A memory address exceeded the stack segment limit 
Stack, #SS : 7 x or was non-canonical. 
Xx x Xx A memory address exceeded a data segment limit or 
General protection, was non-canonical. 
#GP 
Xx A null data segment was used to reference memory. 
A page fault resulted from the execution of the 
Page fault, #PF Xx x eClon: 
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CLFLUSHOPT Optimized Cache Line Flush 


Flushes the cache line specified by the mem8 linear-address. The instruction checks all levels of the 
cache hierarchy-internal caches and external caches-and invalidates the cache line in every cache in 
which it is found. If a cache contains a dirty copy of the cache line (that is, the cache line is in the 
modified or owned MOESI state), the line is written back to memory before it is invalidated. The 
instruction sets the cache-line MOESI state to invalid. 


The instruction also checks the physical address corresponding to the linear-address operand against 
the processor's write-combining buffers. If the write-combining buffer holds data intended for that 
physical address, the instruction writes the entire contents of the buffer to memory. This occurs even 
though the data is not cached in the cache hierarchy. In a multiprocessor system, the instruction checks 
the write-combining buffers only on the processor that executed the CLFLUSHOPT instruction. 


The CLFLUSHOPT instruction is ordered with respect to fence instructions and locked operations. 
CLFLUSHOPT is also ordered with writes, CLFLUSH, and CLFLUSHOPT instructions that 
reference the same cache line as the CLFLUSHOPT. CLFLUSHOPT is not ordered with writes, 
CLFLUSH, and CLFLUSHOPT to other cache lines. To enforce ordering in that situation, a SFENCE 
instruction or stronger should be used. 


Speculative loads initiated by the processor, or specified explicitly using cache-prefetch instructions, 
can be reordered around a CLFLUSHOPT instruction. Such reordering can invalidate a speculatively 
prefetched cache line, unintentionally defeating the prefetch operation. 


The only way to avoid this situation is to use the MFENCE instruction after the CLFLUSHOPT 
instruction to force strong ordering of the CLFLUSHOPT instruction with respect to subsequent 
memory operations. 


The CLFLUSHOPT instruction behaves like a load instruction with respect to setting the page-table 
accessed and dirty bits. That is, it sets the page-table accessed bit to 1, but does not set the page-table 
dirty bit. 


The CLFLUSHOPT instruction executes at any privilege level. CLFLUSHOPT performs all the 
segmentation and paging checks that a 1-byte read would perform, except that it also allows references 
to execute-only segments. 


The CLFLUSHOPT instruction is supported if the feature flag CPUID 
Fn0000_0007_EBX_x0[CLFLOPTIis set. The 8-bit field CPUID Fn 0000_0001_EBX[CLFlush] 
returns the size of the cacheline in quadwords. 


Mnemonic Opcode Description 


CLFLUSHOPT mems 66 OF AE /7 Flush cache line containing mem8 
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Related Instructions 


AMD64 Technology 


CLFLUSH 
rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception (vector) | Real | 8086 | Protected Cause of Exception 
Xx Xx Xx CLFLUSH instruction is not supported, as indicated by 
CPUID Fn0000_0001_EDX[CLFSH] = 0. 
Invalid opcode, #UD - 
Xx Xx Xx Instruction not supported by CPUID 
Fn0000_0007_EBX_x0[CLFLUSHOPT] = 0 
A memory address exceeded the stack segment limit 
Stack, #38 . ‘ x or was non-canonical. 
Xx Xx Xx A memory address exceeded a data segment limit or 
General protection, was non-canonical. 
#GP 
Xx A null data segment was used to reference memory. 
A page fault resulted from the execution of the 
Page fault, #PF Xx Xx jaetrucloy: 
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CLWB Cache Line Write Back and Retain 


Flushes the cache line specified by the mem linear address. The instruction checks all levels of the 
cache hierarchy—internal caches and external caches—and causes the cache line, if dirty, to be written 
to memory. The cache line may be retained in the cache where found in a non-dirty state. 


The CLWB instruction is weakly ordered with respect to other instructions that operate on memory. 
Speculative loads initiated by the processor, or specified explicitly using cache prefetch instructions, 
can be reordered around a CLWB instruction. CLWB is ordered naturally with older stores to the same 
address on the same logical processor. To create strict ordering of CLWB use a store-ordering 
instruction such as SFENCE. 


The CLWB instruction behaves like a load instruction with respect to setting the page table accessed 
and dirty bits. That is, it sets the page table accessed bit to 1, but does not set the page table dirty bit. 


The CLWB instruction executes at any privilege level. CLWB performs all the segmentation and 
paging checks that a 1-byte read would perform, except that it also allows references to execute only 
segments. 


The CLWB instruction is supported if the feature flag CPUID Fn0000_0007-EBX[24]=1. 
The 8-bit field CPUID Fn 0000_0001_EBX[CLFlush] returns the size of the cacheline in quadwords. 


Mnemonic Opcode Description 


CLWB 66 OF AE /6 Cache line write-back. 


Related Instructions 


CLFLUSH, CLFLUSHOPT, WBINVD, WBNOINVD 
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rFLAGS Affected 
ID | VIP | VIF ; AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception (vector) | Real | 8086 | Protected Cause of Exception 
: Instruction not supported by CPUID 
Invalid opcode, #UD Xx Xx Xx Fn0000_0007_EBX(24] = 0 
A memory address exceeded the stack segment limit 
Stack, #SS ‘ . . or was non-canonical. 
Xx Xx Xx A memory address exceeded a data segment limit or 
General protection, was non-canonical. 
#GP 
x A null data segment was used to reference memory. 
A page fault resulted from the execution of the 
Page fault, #PF Xx Xx ‘ EtnIGHON: 
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CLZERO Zero Cache Line 


Clears the cache line specified by the logical address in rAX by writing a zero to every byte in the line. 
The instruction uses an implied non temporal memory type, similar to a streaming store, and uses the 
write combining protocol to minimize cache pollution. 


CLZERO is weakly-ordered with respect to other instructions that operate on memory. Software 
should use an SFENCE or stronger to enforce memory ordering of CLZERO with respect to other 
store instructions. 


The CLZERO instruction executes at any privilege level. CLZERO performs all the segmentation and 
paging checks that a store of the specified cache line would perform. 


The CLZERO instruction is supported if the feature flag CPUID Fn8000_0008_EBX[CLZERO] is 
set. The 8-bit field CPUID Fn 0000_0001_EBX[CLFlush] returns the size of the cacheline in 
quadwords. 


Mnemonic Opcode Description 


CLZERO rAX OF 01 FC Clears cache line containing rAX 


Related Instructions 


CLFLUSH 
rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception (vector) | Real | 8086 | Protected Cause of Exception 


Instruction not supported by CPUID 
invalid opcode, #UD =| X A X | Fn8000_0008_EBX{CLZERO] = 0 


A memory address exceeded the stack segment limit 


Stack, #SS : a x or was non-canonical. 
Xx Xx Xx A memory address exceeded a data segment limit or 
vei protection, was non-canonical. 
Xx A null data segment was used to reference memory. 
A page fault resulted from the execution of the 
Page fault, #PF Xx Xx TaEtRICHION, 
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CMC Complement Carry Flag 
Complements (toggles) the carry flag (CF) bit of the rFLAGS register. 


Mnemonic Opcode Description 


CMC F5 Complement the carry flag (CF). 
Related Instructions 
CLC, STC 
rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 


M 
21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to O is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 


None 
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CMOVcc Conditional Move 


Conditionally moves a 16-bit, 32-bit, or 64-bit value in memory or a general-purpose register (second 
operand) into a register (first operand), depending upon the settings of condition flags in the rFLAGS 
register. If the condition is not satisfied, the destination register is not modified. For the memory-based 
forms of CMOVcc, memory-related exceptions may be reported even if the condition is false. In 64-bit 
mode, CMOVcc with a 32-bit operand size will clear the upper 32 bits of the destination register even 
if the condition is false. 


The mnemonics of CMOVcc instructions denote the condition that must be satisfied. Most assemblers 
provide instruction mnemonics with A (above) and B (below) tags to supply the semantics for 
manipulating unsigned integers. Those with G (greater than) and L (less than) tags deal with signed 
integers. Many opcodes may be represented by synonymous mnemonics. For example, the CMOVL 
instruction is synonymous with the CMOVNGE instruction and denote the instruction with the opcode 
OF 4C. 


The feature flag CPUID Fn0000_0001_EDX[CMOV] or CPUID Fn8000_0001_EDX[CMOV] =1 
indicates support for CMOVcc instructions on a particular processor implementation. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Opcode Description 
CMOVO reg16, reg/mem16 
CMOVO reg32, reg/mem32 OF 40 /r Move if overflow (OF = 1). 


CMOVO reg64, reg/mem64 


CMOVNO regi6, reg/mem16 
CMOVNO reg32, reg/mem32 OF 41 /r Move if not overflow (OF = 0). 
CMOVNO reg64, reg/mem64 


CMOVB reg16, reg/mem16 
CMOVB reg32, reg/mem32 OF 42 /r Move if below (CF = 1). 
CMOVEB reg64, reg/mem64 


CMOVC reg16, reg/mem16 
CMOVC reg32, reg/mem32 OF 42 /r Move if carry (CF = 1). 
CMOVC reg64, reg/mem64 


CMOVNAE reg16, reg/mem16 
CMOVNAE reg32, reg/mem32 OF 42 /r Move if not above or equal (CF = 1). 
CMOVNAE reg64, reg/nem64 


CMOVNB reg16,reg/mem16 
CMOVNB reg32,reg/mem32 OF 43 /r Move if not below (CF = 0). 
CMOVNB reg64,reg/mem64 


CMOVNC reg16,reg/mem16 
CMOVNC reg32,reg/mem32 OF 43 /r Move if not carry (CF = 0). 
CMOVNC reg64,reg/mem64 
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Mnemonic Opcode Description 

CMOVAE reg16, reg/mem16 

CMOVAE reg32, reg/mem32 OF 43 /r Move if above or equal (CF = 0). 


CMOVAE reg64, reg/mem64 


CMOVZ reg16, reg/mem16 
CMOVZ reg32, reg/mem32 OF 44 /r Move if zero (ZF = 1). 
CMOVZ reg64, reg/mem64 


CMOVE regi6, reg/mem16 
CMOVE reg32, reg/mem32 OF 44 /r Move if equal (ZF =1). 
CMOVE reg64, reg/mem64 


CMOVNZ regi6, reg/mem16 
CMOVNZ reg32, reg/mem32 OF 45 /r Move if not zero (ZF = 0). 
CMOVNZ reg64, reg/mem64 


CMOVNE reg16, reg/mem16 
CMOVNE reg32, reg/mem32 OF 45 /r Move if not equal (ZF = 0). 
CMOVNE reg64, reg/mem64 


CMOVBE reg16, reg/mem16 
CMOVBE reg32, reg/mem32 OF 46 /r Move if below or equal (CF = 1 or ZF = 1). 
CMOVBE reg64, reg/mem64 


CMOVNA reg16, reg/mem16 
CMOVNA reg32, reg/mem32 OF 46 /r Move if not above (CF = 1 or ZF = 1). 
CMOVNA reg64, reg/mem64 


CMOVNBE reg1i6, reg/nem16 
CMOVNBE reg32,reg/mem32 OF 47 /r Move if not below or equal (CF = 0 and ZF = 0). 
CMOVNBE reg64,reg/mem64 


CMOVA reg16, reg/mem16 
CMOVA reg32, reg/mem32 OF 47 /r Move if above (CF = 0 and ZF = 0). 
CMOVA reg64, reg/mem64 


CMOVS regi6, reg/mem16 
CMOVS reg32, reg/mem32 OF 48 /r Move if sign (SF =1). 
CMOVS reg64, reg/mem64 


CMOVNS reg16, reg/mem16 
CMOVNS reg32, reg/mem32 OF 49 /r Move if not sign (SF = 0). 
CMOVNS reg64, reg/mem64 


CMOVP reg16, reg/mem16 
CMOVP reg32, reg/mem32 OF 4A/r Move if parity (PF = 1). 
CMOVP reg64, reg/mem64 


CMOVPE reg16, reg/mem16 
CMOVPE reg32, reg/mem32 OF 4A/r Move if parity even (PF = 1). 
CMOVPE reg64, reg/mem64 


CMOVNP reg16, reg/mem16 
CMOVNP reg32, reg/mem32 OF 4B /r Move if not parity (PF = 0). 
CMOVNP reg64, reg/mem64 


CMOVPO reg16, reg/mem16 
CMOVPO reg32, reg/mem32 OF 4B /r Move if parity odd (PF = 0). 
CMOVPO reg64, reg/mem64 
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Mnemonic 


CMOVL reg16, reg/mem16 
CMOVL reg32, reg/mem32 
CMOVL reg64, reg/mem64 


CMOVNGE reg16, reg/mem16 
CMOVNGE reg32, reg/mem32 
CMOVNGE reg64, reg/mem64 


CMOVNL regi6, reg/mem16 
CMOVNL reg32, reg/mem32 
CMOVNL reg64, reg/nem64 


CMOVGE reg16, reg/mem16 
CMOVGE reg32, reg/mem32 
CMOVGE reg64, reg/mem64 


CMOVLE reg16, reg/mem16 
CMOWLE reg32, reg/mem32 
CMOWVLE reg64, reg/mem64 


CMOVNG regi6, reg/mem16 
CMOVNG reg32, reg/mem32 
CMOVNG reg64, reg/mem64 


CMOVNLE reg1i6, reg/mem16 
CMOVNLE reg32, reg/mem32 
CMOVNLE reg64, reg/mem64 


CMOVG reg16, reg/mem16 
CMOVG reg32, reg/mem32 
CMOVG reg64, reg/mem64 


Related Instructions 


MOV 


rFLAGS Affected 


None 


Exceptions 


Exception Real 


Invalid opcode, Xx 
#UD 


Opcode 


OF 4C /r 


OF 4C /r 


OF 4D /r 


OF 4D /r 


OF 4E /r 


OF 4E /r 


OF 4F /r 


OF 4F /r 


Protected 


Xx 


24594—Rev. 3.33—November 2021 


Description 


Move if less (SF <> OF). 


Move if not greater or equal (SF <> OF). 


Move if not less (SF = OF). 


Move if greater or equal (SF = OF). 


Move if less or equal (ZF = 1 or SF <> OF). 


Move if not greater (ZF = 1 or SF <> OF). 


Move if not less or equal (ZF = 0 and SF = OF). 


Move if greater (ZF = 0 and SF = OF). 


Cause of Exception 


CMOVcc instruction is not supported, as indicated by CPUID 
Fn0000_0001_EDX[CMOV] or Fn8000_0001_ EDX[CMOV] = 
0. 


Stack, #SS x 


Amemory address exceeded the stack segment limit or was 
non-canonical. 


General protection, Xx 
#GP 


A memory address exceeded a data segment limit or was non- 
canonical. 


A null data segment was used to reference memory. 
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Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Page fault, #PF x x A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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CMP Compare 


Compares the contents of a register or memory location (first operand) with an immediate value or the 
contents of a register or memory location (second operand), and sets or clears the status flags in the 
rFLAGS register to reflect the results. To perform the comparison, the instruction subtracts the second 
operand from the first operand and sets the status flags in the same manner as the SUB instruction, but 
does not alter the first operand. If the second operand is an immediate value, the instruction sign- 
extends the value to the length of the first operand. 


Use the CMP instruction to set the condition codes for a subsequent conditional jump (Jcc), 
conditional move (CMOVcc), or conditional SETcc instruction. Appendix F, “Instruction Effects on 
RFLAGS” shows how instructions affect the rFLAGS status flags. 


Mnemonic Opcode Description 


‘ ; Compare an 8-bit immediate value with the contents of 
CMP AL, imm8 3C ib the AL register. 


: . Compare a 16-bit immediate value with the contents of 
CMP AX, imm16 3D iw the AX register. 


: : Compare a 32-bit immediate value with the contents of 
CMP EAX, imm32 3D id the EAX register. 


F : Compare a 32-bit immediate value with the contents of 
CMP RAX, imm32 3D id the RAX register. 


80 /7 ib Compare an 8-bit immediate value with the contents of 


CMP Tegimemns Ine an 8-bit register or memory operand. 


; ; Compare a 16-bit immediate value with the contents of a 
CMP euinem iG, imate ow 16-bit register or memory operand. 


‘ . Compare a 32-bit immediate value with the contents of a 
CME TEGIDCINSE ARS2 Bitte 32-bit register or memory operand. 


‘ 3 Compare a 32-bit signed immediate value with the 
CMP reg/mem64, imm32 81/7 id contents of a 64-bit register or memory operand. 


' f Compare an 8-bit signed immediate value with the 
a LL Baar 1p contents of a 16-bit register or memory operand. 


‘ . Compare an 8-bit signed immediate value with the 
CMP Teginems2 imme BS 17 ID contents of a 32-bit register or memory operand. 


‘ ‘ Compare an 8-bit signed immediate value with the 
CMP reg/mem64, immé 83 /7 ib contents of a 64-bit register or memory operand. 


38 /r Compare the contents of an 8-bit register or memory 


CMP regimema, rege operand with the contents of an 8-bit register. 


Compare the contents of a 16-bit register or memory 
CMP Tegineml6, reg?G 39 /r operand with the contents of a 16-bit register. 


Compare the contents of a 32-bit register or memory 
CMP Tegnemse fegue 39 /r operand with the contents of a 32-bit register. 


Compare the contents of a 64-bit register or memory 
CMP relimemG4 Tees 39 /r operand with the contents of a 64-bit register. 
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Mnemonic Opcode Description 
CMP reg8, reg/mem8 3A /r Compare the contents of an 8-bit register with the 


contents of an 8-bit register or memory operand. 


Compare the contents of a 16-bit register with the 
CMP reg16, reg/mem16 3B /r contents of a 16-bit register or memory operand. 


Compare the contents of a 32-bit register with the 
CMP reg32, reg/mem32 Bu contents of a 32-bit register or memory operand. 


Compare the contents of a 64-bit register with the 
CMP rege tegimicmGs eB contents of a 64-bit register or memory operand. 


When interpreting operands as unsigned, flag settings are as follows: 


Operands CF ZF 
dest > source 0 0 
dest = source 0 1 
dest < source 1 0 


When interpreting operands as signed, flag settings are as follows: 


Operands OF ZF 
dest > source SF 0 
dest = source 0 1 
dest < source NOT SF 0 


Related Instructions 


SUB, CMPSx, SCASx 
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rFLAGS Affected 

ID | VIP | VIF ; AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M M M M M 

21 | 20 | 19 ; 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS - “ non-canonical. 
Xx Xx Amemory address exceeded a data segment limit or was non- 
General protection, canonical. 
#GP 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx x A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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CMPS Compare Strings 
CMPSB 
CMPSW 
CMPSD 
CMPSQ 


Compares the bytes, words, doublewords, or quadwords pointed to by the rSI and rDI registers, sets or 
clears the status flags of the rFLAGS register to reflect the results, and then increments or decrements 
the rSI and rDI registers according to the state of the DF flag in the rFLAGS register. To perform the 
comparison, the instruction subtracts the second operand from the first operand and sets the status 
flags in the same manner as the SUB instruction, but does not alter the first operand. The two operands 
must be the same size. 


If the DF flag is 0, the instruction increments rSI and rDI; otherwise, it decrements the pointers. It 
increments or decrements the pointers by 1, 2, 4, or 8, depending on the size of the operands. 


The forms of the CMPSx instruction with explicit operands address the first operand at seg:[rSI]. The 
value of seg defaults to the DS segment, but may be overridden by a segment prefix. These instructions 
always address the second operand at ES:[rDI]. ES may not be overridden. The explicit operands serve 
only to specify the type (size) of the values being compared and the segment used by the first operand. 


The no-operands forms of the instruction use the DS:[rSI] and ES:[rDI] registers to point to the values 
to be compared. The mnemonic determines the size of the operands. 


Do not confuse this CMPSD instruction with the same-mnemonic CMPSD (compare scalar double- 
precision floating-point) instruction in the 128-bit media instruction set. Assemblers can distinguish 
the instructions by the number and type of operands. 


For block comparisons, the CMPS instruction supports the REPE or REPZ prefixes (they are 
synonyms) and the REPNE or REPNZ prefixes (they are synonyms). For details about the REP 
prefixes, see “Repeat Prefixes” on page 12. If a conditional jump instruction like JL follows a CMPSx 
instruction, the jump occurs if the value of the seg:[rSI] operand is less than the ES:[rDI] operand. This 
action allows lexicographical comparisons of string or array elements. A CMPSx instruction can also 
operate inside a loop controlled by the LOOPcc instruction. 


Mnemonic Opcode Description 


Compare the byte at DS:rSI with the byte at ES:rDI and 


CMPS memé, memé AG then increment or decrement rSI and rDI. 


Compare the word at DS:rSI with the word at ES:rDI and 


CMPS mem16, mem16 Al then increment or decrement rSI and rDI. 

Compare the doubleword at DS:rSI with the doubleword 
CMPS mem32, mem32 Al at ES:rDI and then increment or decrement rSI and rDI. 
CMPS mem64, mem64 AT Compare the quadword at DS:rSI with the quadword at 


ES:rDI and then increment or decrement rSI and rDI. 


General-Purpose : 159 
Instruction Reference [AM D Pu blic Use] 


AMDd¢1 


AMD64 Technology 


Mnemonic 


CMPSB 
CMPSW 
CMPSD 


CMPSQ 


Related Instructions 


24594—Rev. 3.33—November 2021 


Opcode Description 


AG Compare the byte at DS:rSI with the byte at ES:rDI and 
then increment or decrement rSI and rDI. 


AT Compare the word at DS:rSI with the word at ES:rDI and 
then increment or decrement rSI and rDI. 


AT Compare the doubleword at DS:rSI with the doubleword 
at ES:rDI and then increment or decrement rSI and rDI. 


AT Compare the quadword at DS:rSI with the quadword at 
ES:rDI and then increment or decrement rSI and rDI. 


CMP, SCASx 
rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M|]|MsiM |M {IM 
21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10] 9 8 7 6 4 2 0) 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to O is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS ‘ A non-canonical. 
Xx Xx A memory address exceeded a data segment limit or was non- 
General protection, canonical. 
#GP 
x A null data segment was used to reference memory. 

Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 

#AC alignment checking was enabled. 
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CMPXCHG Compare and Exchange 


Compares the value in the AL, AX, EAX, or RAX register with the value in a register or a memory 
location (first operand). If the two values are equal, the instruction copies the value in the second 
operand to the first operand and sets the ZF flag in the rFLAGS register to 1. Otherwise, it copies the 
value in the first operand to the AL, AX, EAX, or RAX register and clears the ZF flag to 0. 


The OF, SF, AF, PF, and CF flags are set to reflect the results of the compare. 


When the first operand is a memory operand, CMPXCHG always does a read-modify-write on the 
memory operand. If the compared operands were unequal, CMPXCHG writes the same value to the 
memory operand that was read. 


The forms of the CMPXCHG instruction that write to memory support the LOCK prefix. For details 
about the LOCK prefix, see “Lock Prefix” on page 11. 


Mnemonic Opcode Description 


Compare AL register with an 8-bit register or memory 
CMPXCHG reg/memé8, reg8 OF BO/r location. If equal, copy the second operand to the first 
operand. Otherwise, copy the first operand to AL. 


Compare AX register with a 16-bit register or memory 
CMPXCHG reg/mem16, reg16 OF B1i/r location. If equal, copy the second operand to the first 
operand. Otherwise, copy the first operand to AX. 


Compare EAX register with a 32-bit register or memory 
CMPXCHG reg/mem32, reg32 OF B1/r location. If equal, copy the second operand to the first 
operand. Otherwise, copy the first operand to EAX. 


Compare RAX register with a 64-bit register or memory 
CMPXCHG reg/mem64, reg64 OF B1/r location. If equal, copy the second operand to the first 
operand. Otherwise, copy the first operand to RAX. 


Related Instructions 


CMPXCHG8B, CMPXCHG16B 
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rFLAGS Affected 

ID | VIP | VIF ; AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M M M M M 

21 | 20 | 19 ; 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Undefined flags are U. 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 

A memory address exceeded the stack segment limit or was 

Stack, #SS ‘ A non-canonical. 
A memory address exceeded a data segment limit or was non- 

X xX 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx xX An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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CMPXCHG8B Compare and Exchange Eight Bytes 
CMPXCHG16B Compare and Exchange Sixteen Bytes 


Compares the value in the rDX:rAX registers with a 64-bit or 128-bit value in the specified memory 
location. If the values are equal, the instruction copies the value in the rCX:rBX registers to the 
memory location and sets the zero flag (ZF) of the rFLAGS register to 1. Otherwise, it copies the value 
in memory to the rDX:rAX registers and clears ZF to 0. 


If the effective operand size is 16-bit or 32-bit, the CMPXCHG8B instruction is used. This instruction 
uses the EDX:EAX and ECX:EBX register operands and a 64-bit memory operand. If the effective 
operand size is 64-bit, the CMPXCHG16B instruction is used; this instruction uses RDX:RAX register 
operands and a 128-bit memory operand. 


The CMPXCHG8B and CMPXCHGI16B instructions always do a read-modify-write on the memory 
operand. If the compared operands were unequal, the instructions write the same value to the memory 
operand that was read. 


The CMPXCHG8B and CMPXCHGIG6B instructions support the LOCK prefix. For details about the 
LOCK prefix, see “Lock Prefix” on page 11. 


Support for the CMPXCHG8B and CMPXCHGI16B instructions is implementation dependent. 
Support for the CMPXCHG8B instruction is indicated by CPUID 
Fn0000_0001_EDX[CMPXCHG8B] or Fn8000_0001_EDX[CMPXCHG8B] = 1. Support for the 
CMPXCHGI16B instruction is indicated by CPUID Fn0000_0001_ECX[CMPXCHGI16B] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


The memory operand used by CMPXCHGI16B must be 16-byte aligned or else a general-protection 
exception is generated. 


Mnemonic Opcode Description 


Compare EDX:EAX register to 64-bit memory location. 
If equal, set the zero flag (ZF) to 1 and copy the 

CMPXCHG8B mem64 OF C7/1m64 ECX:EBxX register to the memory location. Otherwise, 
copy the memory location to EDX:EAX and clear the 
zero flag. 


Compare RDX:RAX register to 128-bit memory location. 
OF C7/1 If equal, set the zero flag (ZF) to 1 and copy the _ 
m128 RCX:RBxX register to the memory location. Otherwise, 

copy the memory location to RDX:RAX and clear the 
zero flag. 


CMPXCHG16B mem128 


Related Instructions 


CMPXCHG 
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rFLAGS Affected 

ID | VIP | VIF ; AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M 

21 | 20 | 19 ; 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
CMPXCHG8B instruction is not supported, as indicated by 
X Xx CPUID Fn0000_0001_EDX[CMPXCHG8B] or 
Fn8000_0001_EDX[CMPXCHG8B] = 0. 
Invalid opcode, 
#UD Xx CMPXCHG16B instruction is not supported, as indicated by 
CPUID Fn0000_0001_ECX[CMPXCHG16B] = 0. 
Xx Xx The operand was a register. 
A memory address exceeded the stack segment limit or was 
Stack, #3 n - non-canonical. 
Xx Xx Amemory address exceeded a data segment limit or was non- 
canonical. 
General protection, Xx The destination operand was in a non-writable segment. 
#GP Xx A null data segment was used to reference memory. 
Xx The memory operand for CMPXCHG16B was not aligned ona 
16-byte boundary. 
Page fault, #PF x x A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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CPUID Processor Identification 


Provides information about the processor and its capabilities through a number of different functions. 
Software should load the number of the CPUID function to execute into the EAX register before 
executing the CPUID instruction. The processor returns information in the EAX, EBX, ECX, and 
EDX registers; the contents and format of these registers depend on the function. 


The architecture supports CPUID information about standard functions and extended functions. The 
standard functions have numbers in the 0000_xxxxh series (for example, standard function 1). To 
determine the largest standard function number that a processor supports, execute CPUID function 0. 


The extended functions have numbers in the 8000_xxxxh series (for example, extended 
function 8000_0001h). To determine the largest extended function number that a processor supports, 
execute CPUID extended function 8000_0000h. If the value returned in EAX is greater than 
8000_0000h, the processor supports extended functions. 


Software operating at any privilege level can execute the CPUID instruction to collect this 
information. In 64-bit mode, this instruction works the same as in legacy mode except that it zero- 
extends 32-bit register results to 64 bits. 


CPUID is a serializing instruction. 


Mnemonic Opcode Description 


Returns information about the processor and its 
CPUID OF A2 capabilities. EAX specifies the function number, and the 
data is returned in EAX, EBX, ECX, EDX. 


Testing for the CPUID Instruction 


To avoid an invalid-opcode exception (#UD) on those processor implementations that do not support 
the CPUID instruction, software must first test to determine if the CPUID instruction is supported. 
Support for the CPUID instruction is indicated by the ability to write the ID bit in the rFLAGS register. 
Normally, 32-bit software uses the PUSHFD and POPFD instructions in an attempt to write 
rFLAGS.ID. After reading the updated rFLAGS.ID bit, a comparison determines if the operation 
changed its value. If the value changed, the processor executing the code supports the CPUID 
instruction. If the value did not change, rFLAGS.ID is not writable, and the processor does not support 
the CPUID instruction. 


The following code sample shows how to test for the presence of the CPUID instruction using 32-bit 
code. 


pushfd ; save EFLAGS 
pop eax store EFLAGS in EAX 


mov ebx, eax ; save in EBX for later testing 
xor eax, 00200000h ; toggle bit 21 

push eax ; push to stack 

popfd ; save changed EAX to EFLAGS 
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pushfd push EFLAGS to TOS 


pop eax ; store EFLAGS in EAX 
cmp eax, ebx ; see if bit 21 has changed 
jz NO_CPUID ; if no change, no CPUID 


Standard Function 0 and Extended Function 8000 0000h 


CPUID standard function 0 loads the EAX register with the largest CPUID standard function number 
supported by the processor implementation; similarly, CPUID extended function 8000_0000h loads 
the EAX register with the largest extended function number supported. 


Standard function 0 and extended function 8000_0000h both load a 12-character string into the EBX, 
EDX, and ECX registers identifying the processor vendor. For AMD processors, the string is 
AuthenticAMD. This string informs software that it should follow the AMD CPUID definition for 
subsequent CPUID function calls. If the function returns another vendor’s string, software must use 
that vendor’s CPUID definition when interpreting the results of subsequent CPUID function calls. 
Table 3-2 shows the contents of the EBX, EDX, and ECX registers after executing function 0 on an 
AMD processor. 


Table 3-2. Processor Vendor Return Values 


Register Return Value ASCII Characters 
EBX 6874_7541h “ht u A” 
EDX 6974_6E65h “i tne” 
ECX 444D_4163h “DM Ac” 


For a description of all feature flags related to instruction subset support, see Appendix D, “Instruction 
Subsets and CPUID Feature Flags,” on page 587. For a description of all defined feature numbers and 
return values, see Appendix E, “Obtaining Processor Information Via the CPUID Instruction,” on 
page 593. 


Related Instructions 
None 

rFLAGS Affected 
None 

Exceptions 


None 
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CRC32 CRC32 Cyclical Redundancy Check 


Performs one step of a 32-bit cyclic redundancy check. 


The first source, which is also the destination, is a doubleword value in either a 32-bit or 64-bit GPR 
depending on the presence of a REX prefix and the value of the REX.W bit. The second source is a 
GPR or memory location of width 8, 16, or 32 bits. A vector of width 40, 48, or 64 bits is derived from 
the two operands as follows: 


1. The low-order 32 bits of the first operand is bit-wise inverted and shifted left by the width of the 
second operand. 


2. The second operand is bit-wise inverted and shifted left by 32 bits 


3. The results of steps 1 and 2 are xored. 


This vector is interpreted as a polynomial of degree 40, 48, or 64 over the field of two elements (i.e., bit 
iis interpreted as the coefficient of X'). This polynomial is divided by the polynomial of degree 32 that 
is similarly represented by the vector 11EDC6F41h. (The division admits an efficient iterative 
implementation based on the xor operation.) The remainder is encoded as a 32-bit vector, which is 
bit-wise inverted and written to the destination. In the case of a 64-bit destination, the upper 32 bits are 
cleared. 


In an application of the CRC algorithm, a data block is partitioned into byte, word, or doubleword 
segments and CRC32 is executed iteratively, once for each segment. 


CRC32 is a SSE4.2 instruction. Support for SSE4.2 instructions is indicated by CPUID 
Fn0000_0001_ECX[SSE42] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Instruction Encoding 


Mnemonic Encoding Notes 
CRC32 reg32, reg/mems F2 OF 38 FO /r Perform CRC32 operation on 8-bit values 
CRC32 reg32, reg/mem8 —-F2 REX OF 38 FO /r ee eee 


chess tegse, eg mente F2 OF 38 Fi /r Effective operand size determines size of second 
CRC32 reg32, reg/mem32.— F2 OF 38 F1/r operand. 


CRC32 reg64, reg/mem8s F2 REX.WOF 38FO/r REX.W=1. 
CRC32 reg64, reg/mem64 =F2REX.WOF 38F1/r REX.W=1. 
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rFLAGS Affected 
None 
Exceptions 
Mode 
Exception Virtual Cause of Exception 
Real) 8086 [Protected 
; x Xx x Lock prefix used 
Invalid opcode, - - — 
#UD x Xx Xx SSE42 instructions are not supported as indicated by CPUID 
Fn0000_0001_ECX[SSE42] = 0. 
A memory address exceeded the stack segment limit or was 
Stack, #SS a a ns non-canonical. 
Amemory address exceeded a data segment limit or was non- 
X X xX 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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DAA Decimal Adjust after Addition 


Adjusts the value in the AL register into a packed BCD result and sets the CF and AF flags in the 
rFLAGS register to indicate a decimal carry out of either nibble of AL. 


Use this instruction to adjust the result of a byte ADD instruction that performed the binary addition of 
one 2-digit packed BCD values to another. 


The instruction performs the adjustment by adding 06h to AL if the lower nibble is greater than 9 or if 
AF = 1. Then 60h is added to AL if the original AL was greater than 99h or if CF = 1. 


If the lower nibble of AL was adjusted, the AF flag is set to 1. Otherwise AF is not modified. If the 
upper nibble of AL was adjusted, the CF flag is set to 1. Otherwise, CF is not modified. SF, ZF, and PF 
are set according to the final value of AL. 


Using this instruction in 64-bit mode generates an invalid-opcode (#UD) exception. 


Mnemonic Opcode Description 


Decimal adjust AL. 
als ay (Invalid in 64-bit mode.) 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
U M M M M M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 


Virtual 
8086 /Protected Cause of Exception 


ee opcode, This instruction was executed in 64-bit mode. 
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DAS Decimal Adjust after Subtraction 


Adjusts the value in the AL register into a packed BCD result and sets the CF and AF flags in the 
rFLAGS register to indicate a decimal borrow. 


Use this instruction to adjust the result of a byte SUB instruction that performed a binary subtraction of 
one 2-digit, packed BCD value from another. 


This instruction performs the adjustment by subtracting 06h from AL if the lower nibble is greater than 
9 or if AF = 1. Then 60h is subtracted from AL if the original AL was greater than 99h or if CF = 1. 


If the adjustment changes the lower nibble of AL, the AF flag is set to 1; otherwise AF is not modified. 
If the adjustment results in a borrow for either nibble of AL, the CF flag is set to 1; otherwise CF is not 
modified. The SF, ZF, and PF flags are set according to the final value of AL. 


Using this instruction in 64-bit mode generates an invalid-opcode (#UD) exception. 


Mnemonic Opcode Description 


Decimal adjusts AL after subtraction. 
DAS a (Invalid in 64-bit mode.) 


Related Instructions 
DAA 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
U M M M M M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 


Virtual 
8086 |Protected Cause of Exception 


ee opcode, This instruction was executed in 64-bit mode. 
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DEC Decrement by 1 


Subtracts 1 from the specified register or memory location. The CF flag is not affected. 


The one-byte forms of this instruction (opcodes 48 through 4F) are used as REX prefixes in 64-bit 
mode. See “REX Prefix” on page 14. 


The forms of the DEC instruction that write to memory support the LOCK prefix. For details about the 
LOCK prefix, see “Lock Prefix” on page 11. 


To perform a decrement operation that updates the CF flag, use a SUB instruction with an immediate 
operand of 1. 


Mnemonic Opcode Description 
DEC reg/mem8 FE /1 eaten By he contents of an 8-bit register or memory 
DEC reg/mem16 FF /1 rasta i contents of a 16-bit register or memory 
DEC reg/mem32 FF /1 ay ne contents of a 32-bit register or memory 
DEC reg/mem64 EF /1 ecole ne contents of a 64-bit register or memory 
DEC mate ee 
DEC reg32 48 4rd Decrement the contents of a 32-bit register by 1. 


(See “REX Prefix” on page 14.) 
Related Instructions 
INC, SUB 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M M M M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 
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Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS ‘. ‘ i: non-canonical. 
Xx Xx Xx A memory address exceeded the data segment limit or was 
non-canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 

Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 


#AC 


alignment checking was enabled. 
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DIV 


Divides the unsigned value in a register by the unsigned value in the specified register or memory 
location. The register to be divided depends on the size of the divisor. 


Unsigned Divide 


When dividing a word, the dividend is in the AX register. The instruction stores the quotient in the AL 
register and the remainder in the AH register. 


When dividing a doubleword, quadword, or double quadword, the most-significant word of the 
dividend is in the rDX register and the least-significant word is in the rAX register. After the division, 
the instruction stores the quotient in the rAX register and the remainder in the rDX register. 


The following table summarizes the action of this instruction: 


Division Size Dividend Divisor Quotient | Remainder Maximum Quotient 
Word/byte AX reg/mems AL AH 255 
Doubleword/word DX:AX reg/mem16 AX DX 65,535 
Quadword/doubleword | EDX:EAX | reg/mem32 EAX EDX ‘toes | 
Double quadword/ : 64 
quadword RDX:RAX reg/mem64 RAX RDX a1 


The instruction truncates non-integral results towards 0 and the remainder is always less than the 
divisor. An overflow generates a #DE (divide error) exception, rather than setting the CF flag. 


Division by zero generates a divide-by-zero exception. 


Mnemonic Opcode Description 


Perform unsigned division of AX by the contents of an 8- 
bit register or memory location and store the quotient in 
AL and the remainder in AH. 


DIV reg/memés F6 /6 


Perform unsigned division of DX:AX by the contents of a 
16-bit register or memory operand store the quotient in 
AX and the remainder in DX. 


DIV reg/mem16 F7 /6 


Perform unsigned division of EDX:EAX by the contents 
of a 32-bit register or memory location and store the 
quotient in EAX and the remainder in EDX. 


DIV reg/mem32 F7 /6 


Perform unsigned division of RDX:RAX by the contents 
of a 64-bit register or memory location and store the 
quotient in RAX and the remainder in RDX. 


DIV reg/mem64 F7 /6 


Related Instructions 


MUL 
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rFLAGS Affected 

ID | VIP | VIF ; AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
U U U U U U 

21 | 20 | 19 ; 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
_ xX xX The divisor operand was 0. 
Divide by zero, #DE - - - 
Xx x The quotient was too large for the designated register. 
A memory address exceeded the stack segment limit or was 
Stack, #55 a . non-canonical. 
Xx Xx A memory address exceeded a data segment limit or was non- 

General protection, canonical. 

#GP 

Xx A null data segment was used to reference memory. 

Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx x An unaligned memory reference was performed while 

#AC alignment checking was enabled. 
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ENTER Create Procedure Stack Frame 


Creates a stack frame for a procedure. 
The first operand specifies the size of the stack frame allocated by the instruction. 


The second operand specifies the nesting level (0 to 31—the value is automatically masked to 5 bits). 
For nesting levels of 1 or greater, the processor copies earlier stack frame pointers before adjusting the 
stack pointer. This action provides a called procedure with access points to other nested stack frames. 


The 32-bitenter N, 0 (anesting level of 0) instruction is equivalent to the following 32-bit 
instruction sequence: 


push ebp ; save current EBP 
mov ebp, esp ; set stack frame pointer value 
sub esp, N ; allocate space for local variables 


The ENTER and LEAVE instructions provide support for block structured languages. The LEAVE 
instruction releases the stack frame on returning from a procedure. 


In 64-bit mode, the operand size of ENTER defaults to 64 bits, and there is no prefix available for 
encoding a 32-bit operand size. 


Mnemonic Opcode Description 
ENTER imm16, 0 C8 iw 00 Create a procedure stack frame. 
ENTER imm16, 1 C8 iw O1 Create a nested stack frame for a procedure. 
ENTER imm16, imm8& C8 iw ib Create a nested stack frame for a procedure. 
Action 


// See “Pseudocode Definition” on page 57. 
ENTER_START: 
temp_ALLOC_SPACE = word-sized immediate specified in the instruction 
(first operand), zero-extended to 64 bits 
temp_LEVEL = byte-sized immediate specified in the instruction 


(second operand), zero-extended to 64 bits 


temp_LEVEL = temp_LEVEL AND 0x1f 
// only keep 5 bits of level count 


PUSH.v old_RBP 


temp_RBP = RSP // This value of RSP will eventually be loaded 
// into RBP. 
IF (temp_LEVEL>0) // Push "temp_LEVEL" parameters to the stack. 


FOR (I=1; I<temp_LEVEL; I++) 
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write 


// All but one of the parameters are copied 
// from higher up on the stack. 


{ 
temp_DATA = READ_MEM.v [SS:old_RBP-I*V] 
PUSH.v temp_DATA 
bi 
PUSH.v temp_RBP // The last parameter is the offset of the old 


// “value of RSP on the stack. 


RSP.s = RSP - temp_ALLOC_SPACE // Leave "temp_ALLOC_SPACE" free bytes on 
// the stack 


WRITE_MEM.v [SS:RSP.s] = temp_unused // ENTER finishes with a memory 


// check on the final stack pointer, 
// but no write actually occurs. 


RBP.v = temp_RBP 
EXIT 


Related Instructions 


LEAVE 

rFLAGS Affected 
None 

Exceptions 

Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack-segment limit or was 

Stack, #SS ‘ ‘ es non-canonical. 

Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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IDIV 


Divides the signed value in a register by the signed value in the specified register or memory location. 
The register to be divided depends on the size of the divisor. 


Signed Divide 


When dividing a word, the dividend is in the AX register. The instruction stores the quotient in the AL 
register and the remainder in the AH register. 


When dividing a doubleword, quadword, or double quadword, the most-significant word of the 
dividend is in the rDX register and the least-significant word is in the rAX register. After the division, 
the instruction stores the quotient in the rAX register and the remainder in the rDX register. 


The following table summarizes the action of this instruction: 


Division Size Dividend Divisor Quotient | Remainder Quotient Range 
Word/byte AX reg/mems AL AH —128 to +127 
Doubleword/word DX:AX reg/mem16 AX DX —32,768 to +32,767 
Quadword/doubleword EDX:EAX | reg/mem32 EAX EDX —pAtig ott 
Double quadword/ : 63 63 
quadword RDX:RAX | reg/mem64 RAX RDX —2 °° to 2°°-1 


The instruction truncates non-integral results towards 0. The sign of the remainder is always the same 
as the sign of the dividend, and the absolute value of the remainder is less than the absolute value of the 
divisor. An overflow generates a #DE (divide error) exception, rather than setting the OF flag. 


To avoid overflow problems, precede this instruction with a CBW, CWD, CDQ, or CQO instruction to 
sign-extend the dividend. 


Mnemonic Opcode Description 


Perform signed division of AX by the contents of an 8-bit 
register or memory location and store the quotient in AL 
and the remainder in AH. 


IDIV reg/mem8& F6 /7 


Perform signed division of DX:AX by the contents of a 
16-bit register or memory location and store the quotient 
in AX and the remainder in DX. 


IDIV reg/mem16 F7/7 


Perform signed division of EDX:EAX by the contents of 
a 32-bit register or memory location and store the 
quotient in EAX and the remainder in EDX. 


IDIV reg/mem32 F7 /7 


Perform signed division of RDX:RAX by the contents of 
a 64-bit register or memory location and store the 
quotient in RAX and the remainder in RDX. 


IDIV reg/mem64 F7 /7 
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IMUL 

rFLAGS Affected 

ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
U U U U U U 

21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10] 9 8 7 6 4 2 ) 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
_ Xx Xx The divisor operand was 0. 
Divide by zero, #DE - - - 
Xx x The quotient was too large for the designated register. 
A memory address exceeded the stack segment limit or was 
Stack, #55 . . non-canonical. 
Xx Xx A memory address exceeded a data segment limit or was non- 

General protection, canonical. 

#GP 

Xx A null data segment was used to reference memory. 

Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 

#AC alignment checking was enabled. 
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IMUL Signed Multiply 


Multiplies two signed operands. The number of operands determines the form of the instruction. 


If a single operand is specified, the instruction multiplies the value in the specified general-purpose 
register or memory location by the value in the AL, AX, EAX, or RAX register (depending on the 
operand size) and stores the product in AX, DX:AX, EDX:EAX, or RDX:RAX, respectively. 


If two operands are specified, the instruction multiplies the value in a general-purpose register (first 
operand) by an immediate value or the value in a general-purpose register or memory location (second 
operand) and stores the product in the first operand location. 


If three operands are specified, the instruction multiplies the value in a general-purpose register or 
memory location (second operand), by an immediate value (third operand) and stores the product in a 
register (first operand). 


The IMUL instruction sign-extends an immediate operand to the length of the other register/memory 
operand. 


The CF and OF flags are set if, due to integer overflow, the double-width multiplication result cannot 
be represented in the half-width destination register. Otherwise the CF and OF flags are cleared. 


Mnemonic Opcode Description 


Multiply the contents of AL by the contents of an 8-bit 
IMUL reg/mem8& F6 /5 memory or register operand and put the signed result in 
AX. 


Multiply the contents of AX by the contents of a 16-bit 
IMUL reg/mem16 F7/5 memory or register operand and put the signed result in 
DX:AX. 


Multiply the contents of EAX by the contents of a 32-bit 
IMUL reg/mem32 F7/5 memory or register operand and put the signed result in 
EDX:EAX. 


Multiply the contents of RAX by the contents of a 64-bit 
IMUL reg/mem64 F7/5 memory or register operand and put the signed result in 
RDX:RAX. 


Multiply the contents of a 16-bit destination register by 
IMUL regi6, reg/mem16 OF AF /r the contents of a 16-bit register or memory operand and 
put the signed result in the 16-bit destination register. 


Multiply the contents of a 32-bit destination register by 
IMUL reg32, reg/mem32 OF AF /r the contents of a 32-bit register or memory operand and 
put the signed result in the 32-bit destination register. 


Multiply the contents of a 64-bit destination register by 
IMUL reg64, reg/mem64 OF AF /r the contents of a 64-bit register or memory operand and 
put the signed result in the 64-bit destination register. 


Multiply the contents of a 16-bit register or memory 
IMUL reg16, reg/mem16,imm8& 6B /rib operand by a sign-extended immediate byte and put the 
signed result in the 16-bit destination register. 
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Mnemonic Opcode Description 
Multiply the contents of a 32-bit register or memory 
IMUL reg32, reg/mem32,imm8 6B /rib operand by a sign-extended immediate byte and put the 
signed result in the 32-bit destination register. 
Multiply the contents of a 64-bit register or memory 
IMUL reg64, reg/mem64,imm8 6B /rib operand by a sign-extended immediate byte and put the 
signed result in the 64-bit destination register. 
Multiply the contents of a 16-bit register or memory 
pi aaa reg/mem16, 69 /r iw operand by a sign-extended immediate word and put 
the signed result in the 16-bit destination register. 
Multiply the contents of a 32-bit register or memory 
esa acca reg/mems2, 69 /r id operand by a sign-extended immediate double and put 
the signed result in the 32-bit destination register. 
Multiply the contents of a 64-bit register or memory 
oe reg/mem64, 69 /r id operand by a sign-extended immediate double and put 
the signed result in the 64-bit destination register. 
Related Instructions 
IDIV 
rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF |) SF | ZF | AF | PF | CF 
M U U U U M 
21; 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #5 . ‘ non-canonical. 
Xx Xx Amemory address exceeded a data segment limit or was non- 
General protection, canonical. 
#GP 
Xx A null data segment was used to reference memory. 
Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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IN Input from Port 


Transfers a byte, word, or doubleword from an I/O port to the AL, AX, or EAX register. The port 
address is specified either by an 8-bit immediate value (00h to FFh) encoded in the instruction or a 16- 
bit value contained in the DX register (0000h to FFFFh). The processor’s I/O address space is distinct 
from system memory addressing. 


For two opcodes (E4h and ECh), the data size of the port is fixed at 8 bits. For the other opcodes (E5h 
and EDh), the effective operand-size determines the port size. If the effective operand size is 64 bits, 
IN reads only 32 bits from the I/O port. 


If the CPL is higher than IOPL, or the mode is virtual mode, IN checks the I/O permission bitmap in 
the TSS before allowing access to the I/O port. (See Volume 2 for details on the TSS I/O permission 
bitmap.) 


Mnemonic Opcode Description 


Input a byte from the port at the address specified by 


IN AL, immé E4 ib immé8 and put it into the AL register. 


Input a word from the port at the address specified by 


A HS E6 ib imm8 and put it into the AX register. 
, F Input a doubleword from the port at the address 
ep ine ES ib specified by imm8 and put it into the EAX register. 
Input a byte from the port at the address specified by the 
IN ALD EU DX register and put it into the AL register. 
Input a word from the port at the address specified by 
AB ep the DX register and put it into the AX register. 
Input a doubleword from the port at the address 
IN EAX, DX ED specified by the DX register and put it into the EAX 


register. 


Related Instructions 


INSx, OUT, OUTSx 


rFLAGS Affected 


None 
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Exceptions 
Virtual 
Exception Real| 8086 | Protected Cause of Exception 
Xx One or more I/O permission bits were set in the TSS for the 
General protection, accessed port. 
#GP Xx The CPL was greater than the IOPL and one or more I/O 
permission bits were set in the TSS for the accessed port. 
Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
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INC Increment by 1 


Adds 1 to the specified register or memory location. The CF flag is not affected, even if the operand is 
incremented to 0000. 


The one-byte forms of this instruction (opcodes 40 through 47) are used as REX prefixes in 64-bit 
mode. See “REX Prefix” on page 14. 


The forms of the INC instruction that write to memory support the LOCK prefix. For details about the 
LOCK prefix, see “Lock Prefix” on page 11. 


To perform an increment operation that updates the CF flag, use an ADD instruction with an 
immediate operand of 1. 


Mnemonic Opcode Description 
INC reg/mem8 FE /0 eae) contents of an 8-bit register or memory 
INC reg/mem16 FF /0 peavey 1 contents of a 16-bit register or memory 
INC reg/mem32 FF /0 Peevey contents of a 32-bit register or memory 
INC reg/mem64 FF /O eta aia contents of a 64-bit register or memory 


Increment the contents of a 16-bit register by 1. 


INC reg16 40 +rw (These opcodes are used as REX prefixes in 64-bit 
mode. See “REX Prefix” on page 14.) 


Increment the contents of a 32-bit register by 1. 


INC reg32 40 +rd (These opcodes are used as REX prefixes in 64-bit 
mode. See “REX Prefix” on page 14.) 


Related Instructions 


ADD, DEC 
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rFLAGS Affected 

ID | VIP | VIF ; AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M M M M 

21 | 20 | 19 ; 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Undefined flags are U. 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS - Zs non-canonical. 
Amemory address exceeded a data segment limit or was non- 
X X ; 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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INS Input String 
INSB 
INSW 
INSD 


Transfers data from the I/O port specified in the DX register to an input buffer specified in the rDI 
register and increments or decrements the rDI register according to the setting of the DF flag in the 
rFLAGS register. 


If the DF flag is 0, the instruction increments rDI by 1, 2, or 4, depending on the number of bytes read. 
If the DF flag is 1, it decrements the pointer by 1, 2, or 4. 


In 16-bit and 32-bit mode, the INS instruction always uses ES as the data segment. The ES segment 
cannot be overridden with a segment override prefix. In 64-bit mode, INS always uses the 
unsegmented memory space. 


The INS instructions use the explicit memory operand (first operand) to determine the size of the I/O 
port, but always use ES:[rDI] for the location of the input buffer. The explicit register operand (second 
operand) specifies the I/O port address and must always be DX. 


The INSB, INSW, and INSD instructions copy byte, word, and doubleword data, respectively, from 
the I/O port (0000h to FFFFh) specified in the DX register to the input buffer specified in the ES:rDI 
registers. 


If the operand size is 64-bits, the instruction behaves as if the operand size were 32-bits. 


If the CPL is higher than the IOPL or the mode is virtual mode, INSx checks the I/O permission bitmap 
in the TSS before allowing access to the I/O port. (See volume 2 for details on the TSS I/O permission 
bitmap.) 


The INSx instructions support the REP prefix for block input of rCX bytes, words, or doublewords. 
For details about the REP prefix, see “Repeat Prefixes” on page 12. 


Mnemonic Opcode Description 


Input a byte from the port specified by DX, put it into the 
INS mem8, DX 6C memory location specified in ES:rDI, and then 
increment or decrement rDI. 


Input a word from the port specified by DX register, put it 
INS mem16, DX 6D into the memory location specified in ES:rDI, and then 
increment or decrement rDI. 


Input a doubleword from the port specified by DX, put it 
INS mem32, DX 6D into the memory location specified in ES:rDI, and then 
increment or decrement rDI. 


Input a byte from the port specified by DX, put it into the 
INSB 6C memory location specified in ES:rDI, and then 
increment or decrement rDI. 
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Mnemonic Opcode Description 
Input a word from the port specified by DX, put it into the 
INSW 6D memory location specified in ES:rDI, and then 
increment or decrement rDI. 
Input a doubleword from the port specified by DX, put it 
INSD 6D into the memory location specified in ES:rDI, and then 
increment or decrement rDI. 
Related Instructions 
IN, OUT, OUTSx 
rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 [Protected Cause of Exception 
Xx Xx Xx Amemory address exceeded a data segment limit or was non- 
canonical. 
Xx One or more I/O permission bits were set in the TSS for the 
accessed port. 
General protection, 
#GP x The CPL was greater than the IOPL and one or more I/O 
permission bits were set in the TSS for the accessed port. 
Xx A null data segment was used to reference memory. 
Xx The destination operand was in a non-writable segment. 
Page fault, #PF Xx x A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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INT Interrupt to Vector 


Transfers execution to the interrupt handler specified by an 8-bit unsigned immediate value. This value 
is an interrupt vector number (00h to FFh), which the processor uses as an index into the interrupt- 
descriptor table (IDT). 


For detailed descriptions of the steps performed by INTn instructions, see the following: 


e Legacy-Mode Interrupts: “Virtual-8086 Mode Interrupt Control Transfers” in Volume 2. 
¢ Long-Mode Interrupts: “Long-Mode Interrupt Control Transfers” in Volume 2. 


See also the descriptions of the INT3 instruction on page 381 and the INTO instruction on page 195. 


Mnemonic Opcode Description 
; ‘ Call interrupt service routine specified by interrupt 
INT imm& CD ib Vectoring: 
Action 


// For functions READ_IDT, READ _DESCRIPTOR, READ_INNER_LEVEL_SP, 

// ShadowStacksEnabled and SET_TOKEN_BUSY see "Pseudocode Definition" 
// on page 57 

INT_N_START: 


IF (REAL_MODE) 


INT_N_REAL // INTn real mode 
ELSEIF (PROTECTED_MODE ) 

INT_N_PROTECTED // INTn protected mode 
ELSE // (VIRTUAL_MODE ) 

INT_N_VIRTUAL // INTn virtual mode 
INT_N_REAL: 


temp_int_n_vector = byte-sized interrupt vector specified in 
the instruction, zero-extended to 64 bits 


// read target CS:RIP from the real-mode IDT 
temp_RIP = READ_MEM.w [idt:temp_int_n_vector*4] 
temp_CS = READ_MEM.w [idt:temp_int_n_vector*4+2] 


PUSH.w old _RFLAGS 
PUSH.w old_CS 
PUSH.w next_RIP 


IF (temp_RIP > CS.limit) 
EXCEPTION [#GP(0)] 


cs.sel 
CS.base 


temp_CS 
temp_CS SHL 4 
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RFLAGS.AC, TF, IF,RF cleared 


RIP = 


EXIT 


temp_RIP 


INT_N_PROTECTED: 


temp_int_n_vector = byte-sized interrupt vector specified in 


the instruction, zero-extended to 64 bits 


temp_idt_desc = READ_IDT (temp_int_n_vector) 


IF (temp_idt_desc.attr.type == ’taskgate’ ) 
TASK_SWITCH // using TSS selector in the task gate as the target TSS 


// The size of the gate controls the size of the stack pushes 
IF (LONG_MODE) 


Vv 
ELSEI 


ELSE 


temp_ 


IF (L 
{ 
te 
te 


t 
cos 


IF (C 
te 
ELSE 
te 


= 8-byte // Long mode only uses 64-bit gates 

F ((temp_idt_desc.attr.type == ’intgate32’) || 
(temp_idt_desc.attr.type == ‘trapgate32’)) 

= 4-byte // Legacy mode, using a 32-bit gate 


= 2-byte // Legacy mode, using a 16-bit gate 

RIP = temp_idt_desc.offset 

ONG_MODE) // In long mode, read 2nd half of 16-byte interrupt-gate 
// from the IDT to get the upper 32 bits of target RIP 


mp_upper READ_MEM.q [idt:temp_int_n_vector*16+8] 
mp_RIP = temp_RIP + (temp_upper SHL 32) // form 64-bit target RIP 


READ_DESCRIPTOR (temp_idt_desc.segment, intcs_chk) 


S.attr.conforming == 1) 
mp_CPL = CPL 


mp_CPL = CS.attr.dpl 


IF (CPL == temp_CPL) // no privilege-level change 


{ 


temp_CheckToken = FALSE 
IF (LONG_MODE) 


{ 
IF (temp_idt_desc.ist != 0) 
. 
// IDT gate IST is non-zero, do stack switch 
RSP = READ_MEM.q [tss:ist_index*8+28] // fetch new RSP 
RSP = RSP AND OXFFFFFFFFFFFFFFFO // ensure 16-byte alignment 


// fetch SSP from ISST if sstk enabled at current privilege 
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IF (ShadowStacksEnabled(current CPL) ) 
‘ 
temp_isst_addr = INTERRUPT_SSP_TABLE_ADDR + (temp_idt_desc.ist*8) 
SSP = READ_MEM.q [tss:temp_isst_addr] 
IF (SSP[2:0] != 0) 
EXCEPTION [#GP(0)] // new SSP must be 8-byte aligned 
temp_CheckToken = TRUE 


i 


PUSH.q old_SS // in long mode, save old SS:RSP to stack 
PUSH.q old_RSP 
} = // end long mode 


PUSH.v old_RFLAGS 
PUSH.v old_CS 
PUSH.v next_RIP 


IF (ShadowStacksEnabled(current CPL) ) 


IF (temp_CheckToken == TRUE) 
SET_SSTK_TOKEN_BUSY (SSP) // vaidate token, set busy 
Align SSP to next 8B boundary, storing 4B of 0 if needed 
SSTK_WRITE_MEM.q [SSP-24] = old_CS // push CS,LIP,SSP to shadow stack 
SSTK_WRITE_MEM.q [SSP-16] = (CS.base + old_RIP) 
SSTK_WRITE_MEM.q [SSP-8] old_ssP 
SSP = SSP - 24 
} // end shadow stacks enabled @ CPL 


IF ((64BIT_MODE) && (temp_RIP is non-canonical) || 
(!64BIT_MODE) && (temp_RIP > CS.limit)) 
EXCEPTION [#GP(0) ] 
RFLAGS.VM,NT, TF,RF cleared 
RFLAGS.IF cleared if interrupt gate 
RIP = temp_RIP 


EXIT 
} // end of INTn to same privilege level 
ELSE // INTn to more privileged level 


// (CPL > temp_CPL), changing privilege so get inner level SS:RSP 
CPL = temp_CPL 
temp_SS_desc:temp_RSP = READ_INNER_LEVEL_SP(CPL, temp_idt_desc.ist) 


IF (LONG_MODE) 

temp_RSP = temp_RSP AND OxFFFFFFFFFFFFFFFO // force 16-byte alignment 
RSP = temp_RSP 
SS = temp_SS_desc 


IF (ShadowStacksEnabled(new CPL) ) 


{ 
old_SSP = SSP 
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IF ((temp_idt_desc.ist == ©) || (!LONG_MODE) ) 
SSP = PLn_SSP // where n=new CPL 
ELSE 
t 


temp_isst_addr = INTERRUPT_SSP_TABLE_ADDR + (temp_idt_desc.ist*8) 
SSP = READ_MEM.q [tss:temp_isst_addr] 


IF (SSP[2:0] != 0) // new SSP must be 8-byte aligned 
EXCEPTION [#GP(0) ] 


// Any #SS from the following pushes uses SS.sel as error code 
PUSH.v old_SS 


PUSH.v old_RSP 
PUSH.v old _RFLAGS 
PUSH.v old_CS 
PUSH.v next_RIP 


IF ((ShadowStacksEnabled(CPL 3) && (old_CPL == 3)) 
PL3_SSP = SSP 


IF (ShadowStacksEnabled(new CPL) ) 


old_SSP = SSP 
SSP = PLn_SSP // where n=new CPL 
SET_SSTK_TOKEN_BUSY(SSP) // validate token, set busy 
IF (old_CPL != 3) 
SSTK_WRITE_MEM.q [SSP-24] = old_CS // push CS, LIP, SSP 
SSTK_WRITE_MEM.q [SSP-16] = LIP // onto the shadow stack 
SSTK_WRITE_MEM.q [SSP-8] = old_SSP 
SSP = SSP - 24 
} // end shadow stacks enabled at new CPL 


IF ((64BIT_MODE) && (temp_RIP is non-canonical) | | 
(!64BIT_MODE) && (temp_RIP > CS.limit)) 
EXCEPTION [#GP(0) ] 


RFLAGS.VM,NT, TF,RF cleared 
RFLAGS.IF cleared if interrupt gate 
RIP = temp_RIP 

EXIT 

} end INTn to more privileged level 


INT_N_VIRTUAL : 


temp_int_n_vector = byte-sized interrupt vector specified in 
the instruction, zero-extended to 64 bits 


IF (CR4.VME == 0) // VME isn’t enabled 
IF (RFLAGS.IOPL == 3) 
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INT_N_VIRTUAL_TO_PROTECTED 
ELSE 
EXCEPTION [#GP(0)] 


temp_IRB_BASE = READ_MEM.w [tss:102] - 32 


// Check the VME Interrupt Redirection Bitmap (IRB) to 
// see if we should redirect to a virtual-mode handler 
temp_VME_REDIRECTION = READ_BIT_ARRAY ([tss:temp_IRB_BASE], temp_int_n_vector) 
IF (temp_VME_REDIRECTION == 1) 
{ // continue with transition to protected mode 
IF (RFLAGS.IOPL==3) 
INT_N_VIRTUAL_TO_PROTECTED 
ELSE 
EXCEPTION [#GP(0) ] 
i 
ELSE 
{ // INTn stays in virtual mode 
// redirect interrupt through virtual-mode IDT 
temp_RIP = READ_MEM.w [0:temp_int_n_vector*4] 
// read target CS:RIP from the virtual-mode IDT at linear address 0 
temp_CS = READ_MEM.w [@:temp_int_n_vector*4+2] 
IF (RFLAGS.IOPL < 3) 
old_RFLAGS = old_RFLAGS with VIF bit shifted into IF bit, and IOPL = 3 
PUSH.w old_RFLAGS 
PUSH.w old_CS 
PUSH.w next_RIP 
cS.sel = temp_CS 
CS.base = temp_CS SHL 4 
RFLAGS.TF,RF = 0 
IF (RFLAGS.IOPL == 3) 
RFLAGS.IF = 0 
ELSE 
RFLAGS.VIF = 0 
RIP = temp_RIP 
EXIT 


} 


INT_N_VIRTUAL_TO_PROTECTED: 


temp_idt_desc = READ_IDT (temp_int_n_vector) 
IF (temp_idt_desc.attr.type == ’taskgate’ ) 
TASK_SWITCH // using tss selector in the task gate as the target tss 


// The size of the gate controls the size of the stack pushes 
IF ((temp_idt_desc.attr.type == ’intgate32’) || 
(temp_idt_desc.attr.type == ’trapgate32’)) 

v = 4-byte // legacy mode, using a 32-bit gate 

ELSE // gate is intgate16 or trapgate16 

v = 2-byte // legacy mode, using a 16-bit gate 
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temp_RIP = temp_idt_desc.offset 
old_CPL = CPL 
CS = READ_DESCRIPTOR(temp_idt_desc.segment, intcs_chk) 


IF (CS.attr.dpl !=0 ) // Handler must run at CPL 0. 
EXCEPTION [#GP(CS.sel) ] 


CPL = 0 

temp_ist = 0 // Legacy mode doesn’t use IST pointers 
temp_SS_desc:temp_RSP = READ_INNER_LEVEL_SP(CPL, temp_ist) 

RSP = temp_RSP 

SS = temp_SS_desc 


// Any #SS from the following pushes uses SS.sel as error code 
Vv 


PUSH.v old_GS 

PUSH.v old_FS 

PUSH.v old_DS 

PUSH.v old_ES 

PUSH.v old_SS 

PUSH.v old_RSP 

PUSH.v old _RFLAGS // Pushed with RF = 0 
PUSH.v old_CS 

PUSH.v next_RIP 


IF (temp_RIP > CS.limit) 
EXCEPTION [#GP(0) ] 


DS NULL // can’t use virtual-mode selectors in protected mode 
ES = NULL // can’t use virtual-mode selectors in protected mode 
FS = NULL // can’t use virtual-mode selectors in protected mode 
GS = NULL // can’t use virtual-mode selectors in protected mode 
RFLAGS.VM,NT,TF,RF cleared 

RFLAGS.IF cleared if interrupt gate 

RIP = temp_RIP 


IF (ShadowStacksEnabled(CPL 0)) 


1 
old_SSP = SSP 
SSP = PLO_SSP // fetch new SSP 


SET_SSTK_TOKEN_BUSY(SSP) // vaidate token, set busy 
IF (old_CPL) != 3 


SSTK_WRITE_MEM.q [SSP-24] = old_CS // push CS, LIP, SSP 
SSTK_WRITE_MEM.q [SSP-16] = LIP // onto the shadow stack 
SSTK_WRITE_MEM.q [SSP-8] = old_SSP 

SSP = SSP - 24 

J 


} 


EXIT // end INTn VIRTUAL_TO_PROTECTED 
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Related Instructions 


INT 3, INTO, BOUND 


rFLAGS Affected 


If a task switch occurs, all flags are modified. Otherwise settings are as follows: 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M M 0 M M 0 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Xx Xx As part of a stack switch, the target stack segment selector or 
rSP in the TSS was beyond the TSS limit. 
Xx Xx As part of a stack switch, the target stack segment selector in 
the TSS was a null selector. 
Xx Xx As part of a stack switch, the target stack segment selector’s 
TI bit was set, but the LDT selector was a null selector. 
As part of a stack switch, the target stack segment selector in 
Invalid TSS, #TS Xx Xx the TSS was beyond the limit of the GDT or LDT descriptor 
(selector) table. 
Xx Xx As part of a stack switch, the target stack segment selector in 


the TSS contained a RPL that was not equal to its DPL. 


As part of a stack switch, the target stack segment selector in 
Xx Xx the TSS contained a DPL that was not equal to the CPL of the 
code segment selector. 


Xx Xx As part of a stack switch, the target stack segment selector in 
the TSS was not a writable segment. 


een NE Xx Xx The accessed code segment, interrupt gate, trap gate, task 


gate, or TSS was not present. 


(selector) 
A memory address exceeded the stack segment limit or was 
Stack, #SS mn . . non-canonical, and no stack switch occurred. 
x Xx After a stack switch, a memory address exceeded the stack 
segment limit or was non-canonical. 
Stack, #SS - - - 
(selector) As part of a stack switch, the SS register was loaded with a 
Xx Xx non-null segment selector and the segment was marked not 
present. 
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Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Xx Xx Xx Amemory address exceeded a data segment limit or was non- 
canonical. 
Xx Xx Xx The target offset exceeded the code segment limit or was non- 
General protection, canonical. 
#GP Xx The IOPL was less than 3 and CR4.VME was 0. 
IOPL was less than 3, CR4.VME was 1, and the 
Xx corresponding bit in the VME interrupt redirection bitmap was 
1. 
Xx Xx Xx The interrupt vector was beyond the limit of IDT. 
The descriptor in the IDT was not an interrupt, trap, or task 
x Xx gate in legacy mode or not a 64-bit interrupt or trap gate in 
long mode. 
x Xx The DPL of the interrupt, trap, or task gate descriptor was less 
than the CPL. 
Xx Xx The segment selector specified by the interrupt or trap gate 
General protection, had its TI bit set, but the LDT selector was a null selector. 
#GP Xx Xx The segment descriptor specified by the interrupt or trap gate 
(selector) exceeded the descriptor table limit or was a null selector. 
The segment descriptor specified by the interrupt or trap gate 
xX Xx was not a code segment in legacy mode, or not a 64-bit code 
segment in long mode. 
Xx The DPL of the segment specified by the interrupt or trap gate 
was greater than the CPL. 
Xx The DPL of the segment specified by the interrupt or trap gate 
pointed was not O or it was a conforming segment. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 


#AC 


alignment checking was enabled. 
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INTO Interrupt to Overflow Vector 


Checks the overflow flag (OF) in the rFLAGS register and calls the overflow exception (#OF) handler 
if the OF flag is set to 1. This instruction has no effect if the OF flag is cleared to 0. The INTO 
instruction detects overflow in signed number addition. See AMD64 Architecture Programmer’s 
Manual Volume 1: Application Programming for more information on the OF flag. 


Using this instruction in 64-bit mode generates an invalid-opcode exception. 


For detailed descriptions of the steps performed by INT instructions, see the following: 


¢ Legacy-Mode Interrupts: “Legacy Protected-Mode Interrupt Control Transfers” in Volume 2. 
¢ Long-Mode Interrupts: “Long-Mode Interrupt Control Transfers” in Volume 2. 


Mnemonic Opcode Description 


Call overflow exception if the overflow flag is set. 
EO CE (Invalid in 64-bit mode.) 


Action 
IF (64BIT_MODE) 
EXCEPTION[#UD] 
IF (RFLAGS.OF == 1) // #0F is a trap, and pushes the rIP of the instruction 
EXCEPTION [#0F] // following INTO. 
EXIT 


Related Instructions 


INT, INT 3, BOUND 


rFLAGS Affected 
None. 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Overflow, #OF Xx x Xx The INTO instruction was executed with OF set to 1. 
ee opcode, x Instruction was executed in 64-bit mode. 
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Jcc Jump on Condition 


Checks the status flags in the rFLAGS register and, if the flags meet the condition specified by the 
condition code in the mnemonic (cc), jumps to the target instruction located at the specified relative 
offset. Otherwise, execution continues with the instruction following the Jcc instruction. 


Unlike the unconditional jump (JMP), conditional jump instructions have only two forms—short and 
near conditional jumps. Different opcodes correspond to different forms of one instruction. For 
example, the JO instruction (jump if overflow) has opcode OFh 80h for its near form and 70h for its 
short form, but the mnemonic is the same for both forms. The only difference is that the near form has 
a 16- or 32-bit relative displacement, while the short form always has an 8-bit relative displacement. 


Mnemonics are provided to deal with the programming semantics of both signed and unsigned 
numbers. Instructions tagged A (above) and B (below) are intended for use in unsigned integer code; 
those tagged G (greater) and L (less) are intended for use in signed integer code. 


If the jump is taken, the signed displacement is added to the rIP (of the following instruction) and the 
result is truncated to 16, 32, or 64 bits, depending on operand size. 


In 64-bit mode, the operand size defaults to 64 bits. The processor sign-extends the 8-bit or 32-bit 
displacement value to 64 bits before adding it to the RIP. 


These instructions cannot perform far jumps (to other code segments). To create a far-conditional- 
jump code sequence corresponding to a high-level language statement like: 

IF A == B THEN GOTO FarLabel 
where FarLabel is located in another code segment, use the opposite condition in a conditional short 
jump before an unconditional far jump. Such a code sequence might look like: 


cmp A,B ; compare operands 
jne NextInstr continue program if not equal 
jmp far FarLabel far jump if operands are equal 


~s 


~s 


NextInstr: ; continue program 


For details about control-flow instructions, see “Control Transfers” in Volume 1, and “Control- 
Transfer Privilege Checks” in Volume 2. 


Mnemonic Opcode Description 

JO rel8off 70 cb 

JO rel16off OF 80 cw Jump if overflow (OF = 1). 
JO rel32off OF 80 cd 

JNO rel8off 71cb 

JNO rel16off OF 81 cw Jump if not overflow (OF = 0). 
JNO rel32off OF 81 cd 

JB rel8off 72 cb 

JB rel16off OF 82 cw Jump if below (CF = 1). 

JB rel32off OF 82 cd 


196 : General-Purpose 
[AM D Pu blic Use] Instruction Reference 


AMDd1 


24594—Rev. 3.33—November 2021 AMD64 Technology 
Mnemonic Opcode Description 

JC relsoff 72 cb 

JC rel16off OF 82 cw Jump if carry (CF = 1). 

JC rel32oft OF 82 cd 

JNAE rel8soff 72 cb 

JNAE rel16off OF 82 cw Jump if not above or equal (CF = 1). 

JNAE rel32off OF 82 cd 

JNB relsoff 73 cb 

JNB rel16off OF 83 cw Jump if not below (CF = 0). 

JNB rel32off OF 83 cd 

JNC rel8off 73 cb 

JNC rel16off OF 83 cw Jump if not carry (CF = 0). 

JNC rel32off OF 83 cd 

JAE rel8off 73 cb 

JAE rel16off OF 83 cw Jump if above or equal (CF = 0). 

JAE rel32off OF 83 cd 

JZ rel8oft 74 cb 

JZ rel16off OF 84 cw Jump if zero (ZF = 1). 

JZ rel32off OF 84 cd 

JE rel8off 74 cb 

JE rel16off OF 84 cw Jump if equal (ZF = 1). 

JE rel32off OF 84 cd 

JNZ rel8off 75 cb 

JNZ rel16off OF 85 cw Jump if not zero (ZF = 0). 

JNZ rel32off OF 85 cd 

JNE relsoff 75 cb 

JNE rel16off OF 85 cw Jump if not equal (ZF = 0). 

JNE rel32off OF 85 cd 

JBE rel8off 76 cb 

JBE rel16off OF 86 cw Jump if below or equal (CF = 1 or ZF = 1). 

JBE rel32off OF 86 cd 

JNA rel8off 76 cb 

JNA rel16off OF 86 cw Jump if not above (CF = 1 or ZF = 1). 

JNA rel32off OF 86 cd 

JNBE rel8off 77 cb 

JNBE rel16off OF 87 cw Jump if not below or equal (CF = 0 and ZF = 0). 

JNBE rel32off OF 87 cd 

JA relsoff 77 cb 

JA rel16off OF 87 cw Jump if above (CF = 0 and ZF = 0). 

JA rel32off OF 87 cd 

JS rel8off 78 cb 

JS rel16off OF 88 cw Jump if sign (SF = 1). 

JS rel32off OF 88 cd 

JNS rel8off 79 cb 

JNS rel16off OF 89 cw Jump if not sign (SF = 0). 

JNS rel32off OF 89 cd 
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Mnemonic Opcode Description 

JP rel8off 7Acb 

JP rel16off OF 8A cw Jump if parity (PF = 1). 

JP rel32off OF 8A cd 

JPE rel8off 7Acb 

JPE rel16off OF 8A cw Jump if parity even (PF = 1). 

JPE rel32off OF 8A cd 

JNP relsoff 7B cb 

JNP rel16off OF 8B cw Jump if not parity (PF = 0). 

JNP rel32off OF 8B cd 

JPO rel8off 7B cb 

JPO rel16off OF 8B cw Jump if parity odd (PF = 0). 

JPO rel32off OF 8B cd 

JL relsoff 7C cb 

JL rel16off OF 8C cw Jump if less (SF <> OF). 

JL rel32off OF 8C cd 

JNGE relgoff 7C cb 

JNGE rel16off OF 8C cw Jump if not greater or equal (SF <> OF). 

JNGE rel32off OF 8C cd 

JNL rel8off 7D cb 

JNL rel16off OF 8D cw Jump if not less (SF = OF). 

JNL rel32off OF 8D cd 

JGE rel8off 7D cb 

JGE rel16off OF 8D cw Jump if greater or equal (SF = OF). 

JGE rel32off OF 8D cd 

JLE rel8off 7E cb 

JLE rel16off OF 8E cw Jump if less or equal (ZF = 1 or SF <> OF). 

JLE rel32off OF 8E cd 

JING rel8off 7E cb 

JNG rel16off OF 8E cw Jump if not greater (ZF = 1 or SF <> OF). 

JING rel32off OF 8E cd 

JNLE rel8soff 7F cb 

JNLE reli6off OF 8F cw Jump if not less or equal (ZF = 0 and SF = OF). 

JNLE rel32off OF 8F cd 

JG rel8off 7F cb 

JG rel16off OF 8F cw Jump if greater (ZF = 0 and SF = OF). 

JG rel32off OF 8F cd 


Related Instructions 


JMP (Near), JMP (Far), JrCXZ 


rFLAGS Affected 


None 
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Exceptions 


Virtual 
8086 |Protected Cause of Exception 


ree protection, The target offset exceeded the code segment limit or was non- 
canonical. 
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JCXZ Jump if rCX Zero 
JECXZ 
JRCXZ 


Checks the contents of the count register (rCX) and, if 0, jumps to the target instruction located at the 
specified 8-bit relative offset. Otherwise, execution continues with the instruction following the 
JrCXZ instruction. 


The size of the count register (CX, ECX, or RCX) depends on the address-size attribute of the JrCXZ 
instruction. Therefore, JRCXZ can only be executed in 64-bit mode and JCXZ cannot be executed in 
64-bit mode. 


If the jump is taken, the signed displacement is added to the rIP (of the following instruction) and the 
result is truncated to 16, 32, or 64 bits, depending on operand size. 


In 64-bit mode, the operand size defaults to 64 bits. The processor sign-extends the 8-bit displacement 
value to 64 bits before adding it to the RIP. 


For details about control-flow instructions, see “Control Transfers” in Volume 1, and “Control- 
Transfer Privilege Checks” in Volume 2. 


Mnemonic Opcode Description 

JCXZ rel8off E3 cb Jump short if the 16-bit count register (CX) is zero. 
JECXZ rel8off E3 cb Jump short if the 32-bit count register (ECX) is zero. 
JRCXZ relsoff E3 cb Jump short if the 64-bit count register (RCX) is zero. 


Related Instructions 


Jcc, JMP (Near), JMP (Far) 


rFLAGS Affected 
None 
Exceptions 
Virtual 
8086 |Protected Cause of Exception 
General protection, The target offset exceeded the code segment limit or was non- 
#GP canonical 


200 : General-Purpose 
[AM D Pu blic Use] Instruction Reference 


AMD¢\ 
24594—Rev. 3.33—November 2021 AMD64 Technology 


JMP (Near) Near Jump 


Unconditionally transfers control to a new address without saving the current rIP value. This form of 
the instruction jumps to an address in the current code segment and is called a near jump. The target 
operand can specify a register, a memory location, or a label. 


If the JMP target is specified in a register or memory location, then a 16-, 32-, or 64-bit rIP is read from 
the operand, depending on operand size. This rIP is zero-extended to 64 bits. 


If the JMP target is specified by a displacement in the instruction, the signed displacement is added to 
the rIP (of the following instruction), and the result is truncated to 16, 32, or 64 bits depending on 
operand size. The signed displacement can be 8 bits, 16 bits, or 32 bits, depending on the opcode and 
the operand size. 


For near jumps in 64-bit mode, the operand size defaults to 64 bits. The E9 opcode results in RIP = RIP 
+ 32-bit signed displacement, and the FF /4 opcode results in RIP = 64-bit offset from register or 
memory. No prefix is available to encode a 32-bit operand size in 64-bit mode. 


See JMP (Far) for information on far jumps—jumps to procedures located outside of the current code 
segment. For details about control-flow instructions, see “Control Transfers” in Volume 1, and 
“Control-Transfer Privilege Checks” in Volume 2. 


Mnemonic Opcode Description 


Short jump with the target specified by an 8-bit signed 


JMP relsoff EB cb displacement. 

Near jump with the target specified by a 16-bit signed 
JMP rel16off E9 cw displacement. 

Near jump with the target specified by a 32-bit signed 
JMP rel32off E9 cd displacement. 
JMP reg/mem16 FF /4 Near jump with the target specified reg/mem16. 

Near jump with the target specified reg/mem32. 
aE Tegunemoe ad (No prefix for encoding in 64-bit mode.) 
JMP reg/mem64 FF /4 Near jump with the target specified reg/mem64. 


Related Instructions 


JMP (Far), Jec, JrCX 


rFLAGS Affected 


None. 
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Exceptions 
Virtual 
Exception Real; 8086 |Protected Cause of Exception 
Amemory address exceeded the stack segment limit or was 
Stack, #SS ‘. ‘ a non-canonical. 
Xx x Xx Amemory address exceeded a data segment limit or was non- 
canonical. 
General protection a 
’ The target offset exceeded the code segment limit or was non- 
#Or a ‘ ‘ canonical. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 


#AC 


alignment checking was enabled. 
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JMP (Far) Far Jump 


Unconditionally transfers control to a new address without saving the current CS:rIP values. This form 
of the instruction jumps to an address outside the current code segment and is called a far jump. The 
operand specifies a target selector and offset. 


The target operand can be specified by the instruction directly, by containing the far pointer in the jmp 
far opcode itself, or indirectly, by referencing a far pointer in memory. In 64-bit mode, only indirect far 
jumps are allowed, executing a direct far jmp (opcode EA) will generate an undefined opcode 
exception. For both direct and indirect far jumps, if the JMP (Far) operand-size is 16 bits, the 
instruction's operand is a 16-bit selector followed by a 16-bit offset. If the operand-size is 32 or 64 bits, 
the operand is a 16-bit selector followed by a 32-bit offset. 


In all modes, the target selector used by the instruction can be a code selector. Additionally, the target 
selector can also be a call gate in protected mode, or a task gate or TSS selector in legacy protected 
mode. 


¢ Target is a code segment—Control is transferred to the target CS:rIP. In this case, the target offset 
can only be a 16 or 32 bit value, depending on operand-size, and is zero-extended to 64 bits; 64-bit 
offsets are only available via call gates. No CPL change is allowed. 


¢ Target is a call gate—The call gate specifies the actual target code segment and offset, and control 
is transferred to the target CS:rIP. When jumping through a call gate, the size of the target rIP is 16, 
32, or 64 bits, depending on the size of the call gate. If the target rIP is less than 64 bits, it's zero- 
extended to 64 bits. In long mode, only 64-bit call gates are allowed, and they must point to 64-bit 
code segments. No CPL change is allowed. 


¢ Target is a task gate or a TSS—If the mode is legacy protected mode, then a task switch occurs. See 
“Hardware Task-Management in Legacy Mode” in volume 2 for details about task switches. 
Hardware task switches are not supported in long mode. 


See JMP (Near) for information on near jumps—jumps to procedures located inside the current code 
segment. For details about control-flow instructions, see “Control Transfers” in Volume 1, and 
“Control-Transfer Privilege Checks” in Volume 2. 


Mnemonic Opcode Description 


; Far jump direct, with the target specified by a far pointer 
SMES pnutent 6 BAGG contained in the instruction. (Invalid in 64-bit mode.) 


: Far jump direct, with the target specified by a far pointer 
JME FAR paui6.22 EA cp contained in the instruction. (Invalid in 64-bit mode.) 


; Far jump indirect, with the target specified by a far 
aME Pare t6:20 ee pointer in memory (16-bit operand size). 


. Far jump indirect, with the target specified by a far 
IME EAS ei 34 Peis pointer in memory (32- and 64-bit operand size). 
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Action 
// Far jumps (JMPF) 
// See “Pseudocode Definition” on page 57. 


JMPF_START: 


IF (REAL_MODE) 
JMPF_REAL_OR_VIRTUAL 

ELSIF (PROTECTED_MODE) 
JMPF_PROTECTED 

ELSE // (VIRTUAL_MODE) 
JMPF_REAL_OR_VIRTUAL 


JMPF_REAL_OR_VIRTUAL: 


IF (OPCODE == jmpf [mem]) //JMPF Indirect 
{ 

temp_RIP READ_MEM.z [mem] 

temp_CS = READ_MEM.w [mem+Z] 


ELSE // (OPCODE == jmpf direct) 


{ 
temp_RIP = z-sized offset specified in the instruction, 
zero-extended to 64 bits 
temp_CS = selector specified in the instruction 
} 


IF (temp_RIP>CS.1limit) 
EXCEPTION [#GP(0) ] 


CS.sel = temp_CsS 
CS.base = temp_CS SHL 4 
RIP = temp_RIP 

EXIT 


JMPF_PROTECTED: 
IF (OPCODE == jmpf [mem]) // JMPF Indirect 
1 
temp_offset = READ_MEM.z [mem] 
temp_sel READ_MEM.w [mem+Z] 


} 
ELSE // (OPCODE == jmpf direct) 


IF (64BIT_MODE) 
EXCEPTION [#UD] // 'jmpf direct’ is illegal in 64-bit mode 


temp_offset = z-sized offset specified in the instruction, 
zero-extended to 64 bits 
temp_sel = selector specified in the instruction 
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temp_desc = READ_DESCRIPTOR (temp_sel, cs_chk) 
// read descriptor, perform protection and type checks 


IF (temp_desc.attr.type == ’available_tss’ ) 
TASK_SWITCH // using temp_sel as the target tss selector 
ELSIF (temp_desc.attr.type == ’taskgate’ ) 
TASK_SWITCH // using the tss selector in the task gate as the 
// target tss 
ELSIF (temp_desc.attr.type == ‘code’ ) 
// if the selector refers to a code descriptor, then 
// the offset we read is the target RIP 


temp_RIP = temp_offset 
CS = temp_desc 
IF ((!64BIT_MODE) && (temp_RIP > CS.limit)) 
// temp_RIP can’t be non-canonical because 
// it’s a 16- or 32-bit offset, zero-extended to 64 bits 


EXCEPTION [#GP(0)] 
J 
RIP = temp_RIP 
EXIT 


ELSE 


// (temp_desc.attr.type == ‘callgate’ ) 

// if the selector refers to a call gate, then 

// the target CS and RIP both come from the call gate 
temp_RIP = temp_desc.offset 


IF (LONG_MODE) 


// in long mode, we need to read the 2nd half of a 16-byte call-gate 
// from the gdt/ldt to get the upper 32 bits of the target RIP 
temp_upper = READ _MEM.q [temp_sel+8] 
IF (temp_upper’s extended attribute bits != 0) 
EXCEPTION [#GP(temp_sel) ] // Make sure the extended 
// attribute bits are all zero. 


temp_RIP = tempRIP + (temp_upper SHL 32) 
// concatenate both halves of RIP 
J 
CS = READ_DESCRIPTOR (temp_desc.segment, clg_chk) 
// set up new CS base, attr, limits 
IF ((64BIT_MODE) && (temp_RIP is non-canonical) 
|| (!64BIT_MODE) && (temp_RIP > CS.limit)) 
EXCEPTION [#GP(0) ] 
RIP = temp_RIP 
EXIT 
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Related Instructions 


JMP (Near), Jcc, JrcX 


rFLAGS Affected 


24594—Rev. 3.33—November 2021 


None, unless a task switch occurs, in which case all flags are modified. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Xx Xx X The far JUMP indirect opcode (FF /5) had a register operand. 
Invalid opcode, 
#UD Xx The far JUMP direct opcode (EA) was executed in 64-bit 
mode. 
Segment not 
present, #NP Xx ~ canara code segment, call gate, task gate, or TSS was 
(selector) p , 
A memory address exceeded the stack segment limit or was 
Stack, #55 _ - : non-canonical. 
Xx Xx Xx Amemory address exceeded a data segment limit or was non- 
canonical. 
General protection ae 
; The target offset exceeded the code segment limit or was non- 
oF ‘ a a canonical. 
Xx A null data segment was used to reference memory. 
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Exception Real 


Virtual 
8086 


Protected 
xX 


AMD64 Technology 


Cause of Exception 
The target code segment selector was a null selector. 


Xx 


A code, call gate, task gate, or TSS descriptor exceeded the 
descriptor table limit. 


A segment selector’s TI bit was set, but the LDT selector was 
a null selector. 


The segment descriptor specified by the instruction was not a 
code segment, task gate, call gate or available TSS in legacy 
mode, or not a 64-bit code segment or a 64-bit call gate in long 
mode. 


The RPL of the non-conforming code segment selector 
specified by the instruction was greater than the CPL, or its 
DPL was not equal to the CPL. 


General protection, 


The DPL of the conforming code segment descriptor specified 
by the instruction was greater than the CPL. 


#GP 
(selector) 


The DPL of the callgate, taskgate, or TSS descriptor specified 
by the instruction was less than the CPL or less than its own 
RPL. 


The segment selector specified by the call gate or task gate 
was a null selector. 


The segment descriptor specified by the call gate was not a 
code segment in legacy mode or not a 64-bit code segment in 
long mode. 


The DPL of the segment descriptor specified the call gate was 
greater than the CPL and it is a conforming segment. 


The DPL of the segment descriptor specified by the callgate 
was not equal to the CPL and it is a non-conforming segment. 


The 64-bit call gate’s extended attribute bits were not zero. 


The TSS descriptor was found in the LDT. 


Page fault, #PF 


A page fault resulted from the execution of the instruction. 


Alignment check, 
#AC 


«x | KI) «| <I] Ox 


An unaligned memory reference was performed while 
alignment checking was enabled. 
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LAHF Load Status Flags into AH Register 


Loads the lower 8 bits of the rFLAGS register, including sign flag (SF), zero flag (ZF), auxiliary carry 
flag (AF), parity flag (PF), and carry flag (CF), into the AH register. 


The instruction sets the reserved bits 1, 3, and 5 of the rFLAGS register to 1, 0, and 0, respectively, in 
the AH register. 


The LAHF instruction is available in 64-bit mode if CPUID Fn8000_0001_ECX[LahfSahf] = 1. It is 
always available in the other operating modes (including compatibility mode) 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Opcode Description 
Load the SF, ZF, AF, PF, and CF flags into the AH 
ne oF register. 


Related Instructions 


SAHF 


rFLAGS Affected 


None. 


Exceptions 


Virtual 
8086 /Protected Cause of Exception 


fee opcode, The LAHF instruction is not supported in 64-bit mode, as 
indicated by CPUID Fn8000_0001_ ECX[LahfSahf] = 0. 
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LDS 
LES 
LFS 
LGS 
LSS 


AMD64 Technology 


Load Far Pointer 


Loads a far pointer from a memory location (second operand) into a segment register (mnemonic) and 
general-purpose register (first operand). The instruction stores the 16-bit segment selector of the 
pointer into the segment register and the 16-bit or 32-bit offset portion into the general-purpose 
register. The operand-size attribute determines whether the pointer loaded is 32 or 48 bits in length. A 


64-bit operand is not supported. 


These instructions load associated segment-descriptor information into the hidden portion of the 


specified segment register. 


Mnemonic 


LDS reg16, mem16:16 


LDS reg32, mem16:32 


LES regi6, mem16:16 


LES reg32, mem16:32 


LFS reg16, mem16:16 
LFS reg32, mem16:32 
LGS reg16, mem16:16 
LGS reg32, mem16:32 
LSS regi6, mem16:16 
LSS reg32, mem16:32 


Related Instructions 
None 


rFLAGS Affected 


None 


Opcode 


C5 /r 


C5 /r 


C4 /r 


C4 /r 


OF B4/r 
OF B4/r 
OF B5/r 
OF B5/r 
OF B2/r 
OF B2/r 


Description 


Load DS:reg16 with a far pointer from memory. 
[Redefined as VEX (2-byte prefix) in 64-bit mode.] 


Load DS:reg32 with a far pointer from memory. 
[Redefined as VEX (2-byte prefix) in 64-bit mode.] 


Load ES:reg16 with a far pointer from memory. 
[Redefined as VEX (3-byte prefix) in 64-bit mode.] 


Load ES:reg32 with a far pointer from memory. 
[Redefined as VEX (3-byte prefix) in 64-bit mode.] 


Load FS:reg16 with a 32-bit far pointer from memory. 
Load FS:reg32 with a 48-bit far pointer from memory. 


Load GS:reg16 with a 32-bit far pointer from memory. 


Load GS:reg32 with a 48-bit far pointer from memory. 


Load SS:regi6 with a 32-bit far pointer from memory. 
Load SS:reg32 with a 48-bit far pointer from memory. 
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Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Xx Xx Xx The source operand was a register. 
Invalid opcode, ; . 
#UD Xx LDS or LES was executed in 64-bit mode and not subject to 
interpretation as a VEX prefix. 
Soe aie Xx The DS, ES, FS, or GS register was loaded with a non-null 
E : segment selector and the segment was marked not present. 
(selector) 
A memory address exceeded the stack segment limit or was 
Stack, #55 ‘ - . non-canonical. 
Stack, #SS Xx The SS register was loaded with a non-null segment selector 
(selector) and the segment was marked not present. 
Xx Xx Xx Amemory address exceeded a data segment limit or was non- 
General protection, canonical. 
#GP 
Xx A null data segment was used to reference memory. 
Xx A segment register was loaded, but the segment descriptor 
exceeded the descriptor table limit. 
Xx A segment register was loaded and the segment selector’s TI 
bit was set, but the LDT selector was a null selector. 
Xx The SS register was loaded with a null segment selector in 
non-64-bit mode or while CPL = 3. 
General protection, Xx The SS register was loaded and the segment selector RPL 
#GP and the segment descriptor DPL were not equal to the CPL. 
Prey) Xx The SS register was loaded and the segment pointed to was 
not a writable data segment. 
The DS, ES, FS, or GS register was loaded and the segment 
Xx pointed to was a data or non-conforming code segment, but 
the RPL or CPL was greater than the DPL. 
Xx The DS, ES, FS, or GS register was loaded and the segment 
pointed to was not a data segment or readable code segment. 
Page fault, #PF Xx x A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 


#AC 


alignment checking was enabled. 
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LEA Load Effective Address 


Computes the effective address of a memory location (second operand) and stores it in a general- 
purpose register (first operand). 


The address size of the memory location and the size of the register determine the specific action taken 
by the instruction, as follows: 


e Ifthe address size and the register size are the same, the instruction stores the effective address as 
computed. 


e If the address size is longer than the register size, the instruction truncates the effective address to 
the size of the register. 


e Ifthe address size is shorter than the register size, the instruction zero-extends the effective address 
to the size of the register. 


If the second operand is a register, an undefined-opcode exception occurs. 


The LEA instruction is related to the MOV instruction, which copies data from a memory location to a 
register, but LEA takes the address of the source operand, whereas MOV takes the contents of the 
memory location specified by the source operand. In the simplest cases, LEA can be replaced with 
MOV. For example: 


lea eax, [ebx] 


has the same effect as: 

mov eax, ebx 
However, LEA allows software to use any valid ModRM and SIB addressing mode for the source 
operand. For example: 

lea eax, [ebxtedi] 


loads the sum of the EBX and EDI registers into the EAX register. This could not be accomplished by 
a single MOV instruction. 


The LEA instruction has a limited capability to perform multiplication of operands in general-purpose 
registers using scaled-index addressing. For example: 


lea eax, [ebxt+ebx*8] 


loads the value of the EBX register, multiplied by 9, into the EAX register. Possible values of 
multipliers are 2, 4, 8, 3, 5, and 9. 


The LEA instruction is widely used in string-processing and array-processing to initialize an index 
register (rSI or rDI) before performing string instructions such as MOVSx. It is also used to initialize 
the rBX register before performing the XLAT instruction in programs that perform character 
translations. In data structures, the LEA instruction can calculate addresses of operands stored in 
memory, and in particular, addresses of array or string elements. 
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Mnemonic Opcode Description 

LEA reg16, mem 8D /r Store effective address in a 16-bit register. 

LEA reg32, mem 8D /r Store effective address in a 32-bit register. 

LEA reg64, mem 8D /r Store effective address in a 64-bit register. 


Related Instructions 


MOV 


rFLAGS Affected 


None 


Exceptions 


Virtual 
8086 |Protected Cause of Exception 


ee opcode, The source operand was a register. 
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LEAVE Delete Procedure Stack Frame 


Releases a stack frame created by a previous ENTER instruction. To release the frame, it copies the 
frame pointer (in the rBP register) to the stack pointer register (rSP), and then pops the old frame 
pointer from the stack into the rBP register, thus restoring the stack frame of the calling procedure. 
The 32-bit LEAVE instruction is equivalent to the following 32-bit operation: 


MOV ESP, EBP 
POP EBP 


To return program control to the calling procedure, execute a RET instruction after the LEAVE 
instruction. 


In 64-bit mode, the LEAVE operand size defaults to 64 bits, and there is no prefix available for 
encoding a 32-bit operand size. 


Mnemonic Opcode Description 


Set the stack pointer register SP to the value in the BP 
EEnVE cg register and pop BP. 


Set the stack pointer register ESP to the value in the 
LEAVE C9 EBP register and pop EBP. 
(No prefix for encoding this in 64-bit mode.) 


Set the stack pointer register RSP to the value in the 
ae oe RBP register and pop RBP. 


Related Instructions 


ENTER 
rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #55 . ‘ - non-canonical. 
Page fault, #PF xX Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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LFENCE Load Fence 


Acts as a barrier to force strong memory ordering (serialization) between load instructions preceding 
the LFENCE and load instructions that follow the LFENCE. Loads from differing memory types may 
be performed out of order, in particular between WC/WC+ and other memory types. The LFENCE 
instruction assures that the system completes all previous loads before executing subsequent loads. 


The LFENCE instruction is weakly-ordered with respect to store instructions, data and instruction 
prefetches, and the SFENCE instruction. Speculative loads initiated by the processor, or specified 
explicitly using cache-prefetch instructions, can be reordered around an LFENCE. 


In addition to load instructions, the LFENCE instruction is strongly ordered with respect to other 
LFENCE instructions, as well as MFENCE and other serializing instructions. Further details on the 
use of MFENCE to order accesses among differing memory types may be found in AMD64 
Architecture Programmer’s Manual Volume 2: System Programming, section 7.4 “Memory Types” on 
page 172. 


LFENCE is an SSE2 instruction. Support for SSE2 instructions is indicated by CPUID 
Fn0000_0001_EDX[SSE2] = 


In some systems, LFENCE may be configured to be dispatch serializing. In systems where CPUID 
Fn8000_0021_EAX[LFenceAlwaysSerializing](bit 2) = 1, LFENCE is always dispatch serializing. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Opcode Description 
LFENCE OF AE E8 Force strong ordering of (serialize) load operations. 


Related Instructions 


MFENCE, SFENCE, MCOMMIT 


rFLAGS Affected 


None 


Exceptions 


Virtual 
8086 |Protected Cause of Exception 


ee opcode, SSE2 instructions are not Shy out as indicated by CPUID 
Fn0000_0001_EDX[SSE2] = 
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LLWPCB Load Lightweight Profiling Control Block 
Address 


Parses the Lightweight Profiling Control Block at the address contained in the specified register. If the 
LWPCGB is valid, writes the address into the LWP_CBADDR MSR and enables Lightweight Profiling. 


See Volume 2, Chapter 13, for an overview of the lightweight profiling facility. 


The LWPCB must be in memory that is readable and writable in user mode. For better performance, it 
should be aligned on a 64-byte boundary in memory and placed so that it does not cross a page 
boundary, though neither of these suggestions is required. 


The LWPCB address in the register is truncated to 32 bits if the operand size is 32. 


Action 


1. If LWP is not available or if the machine is not in protected mode, LLWPCB immediately causes 
a #UD exception. 


2. If LWP is already enabled, the processor flushes the LWP state to memory in the old LWPCB. See 
description of the SLWPCB instruction on page 340 for details on saving the active LWP state. 


If the flush causes a #PF exception, LWP remains enabled with the old LWPCB still active. Note 
that the flush is done before LWP attempts to access the new LWPCB. 


If the specified LWPCB address is 0, LWP is disabled and the execution of LLWPCB is complete. 


4. The LWPCB address is non-zero. LLWPCB validates it as follows: 


- If any part of the LWPCB or the ring buffer is beyond the data segment limit, LLWPCB causes 
a #GP exception. 


- Ifthe ring buffer size is below the implementation’s minimum ring buffer size, LLWPCB 
causes a #GP exception. 


- While doing these checks, LWP reads and writes the LWPCB, which may cause a #PF 
exception. 


If any of these exceptions occurs, LLWPCB aborts and LWP is left disabled. Usually, the operating 
system will handle a #PF exception by making the memory available and returning to retry the 
LLWPCB instruction. The #GP exceptions indicate application programming errors. 


5. LWP converts the LWPCB address and the ring buffer address to linear address form by adding 
the DS base address and stores the addresses internally. 


6. LWP examines the LWPCB.Flags field to determine which events should be enabled and whether 
threshold interrupts should be taken. It clears the bits for any features that are not available and 
stores the result back to LWPCB.Flags to inform the application of the actual LWP state. 


7. For each event being enabled, LWP examines the EventIntervaln value and, if necessary, sets it to 
an implementation-defined minimum. (The minimum event interval for LWPVAL is zero.) It 
loads its internal counter for the event from the value in EventCountern. A zero or negative value 
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in EventCountern means that the next event of that type will cause an event record to be stored. To 
count every Fas event, a program should set EventIntervaln to j-1 and EventCountern to some 
starting value (where j-1 is a good initial count). If the counter value is larger than the interval, the 
first event record will be stored after a larger number of events than subsequent records. 


8. LWP is started. The execution of LLWPCB is complete. 


Notes 


If none of the bits in the LWPCB.Flags specifies an available event, LLWPCB still enables LWP to 
allow the use of the LWPINS instruction. However, no other event records will be stored. 


A program can temporarily disable LWP by executing SLWPCB to obtain the current LWPCB 
address, saving that value, and then executing LLWPCB with a register containing 0. It can later re- 
enable LWP by executing LLWPCB with a register containing the saved address. 


When LWP is enabled, it is typically an error to execute LLWPCB with the address of the active 
LWPCB. When the hardware flushes the existing LWP state into the LWPCB, it may overwrite fields 
that the application may have set to new LWP parameter values. The flushed values will then be loaded 
as LWP is restarted. To reuse an LWPCB, an application should stop LWP by passing a zero to 
LLWPCB, then prepare the LWPCB with new parameters and execute LLWPCB again to restart LWP. 


Internally, LWP keeps the linear address of the LWPCB and the ring buffer. If the application changes 
the value of DS, LWP will continue to collect samples even if the new DS value would no longer allow 
access the LWPCB or the ring buffer. However, a #GP fault will occur if the application uses XRSTOR 
to restore LWP state saved by XSAVE. Programs should avoid using XSAVE/XRSTOR on LWP state 
if DS has changed. This only applies when the CPL != 0; kernel mode operation of XRSTOR is 
unaffected by changes to DS. See instruction listing for XSAVE in Volume 4 for details. 


Operating system and hypervisor code that runs when CPL # 3 should use XSAVE and XRSTOR to 
control LWP rather than using LLWPCB. Use WRMSR to write 0 to the LWP_CBADDR MSR to 
immediately stop LWP without saving its current state. 


It is possible to execute LLWPCB when the CPL != 3 or when SMM is active, but the system software 
must ensure that the LWPCB and the entire ring buffer are properly mapped into writable memory in 
order to avoid a #PF or #GP fault. Furthermore, if LWP is enabled when a kernel executes LLWPCB, 
both the old and new control blocks and ring buffers must be accessible. Using LLWPCB in these 
situations is not recommended. 


LLWPCEB is an LWP instruction. Support for LWP instructions is indicated by CPUID 
Fn8000_0001_ECX[LWP] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 
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Instruction Encoding 


Mnemonic 


LLWPCB reg32 
LLWPCB reg64 


AMD64 Technology 
Encoding 
XOP RXB.map_select W.vvvv.L.pp Opcode 
8F RXB.09 0.1111.0.00 12 /0 
8F RXB.09 1.1111.0.00 12 /0 


ModRM.reg augments the opcode and is assigned the value 0. ModRM.1/m (augmented by XOP.R) 
specifies the register containing the effective address of the LWPCB. ModRM.mod is 11b. 


Related Instructions 


SLWPCB, LWPVAL, LWPINS 


rFLAGS Affected 
None 
Exceptions 
F Virtual : 
Exception Real! 8086 |Protected Cause of Exception 
X Xx X LWP instructions are not supported, as indicated by CPUID 
Invalid opcode, Fn8000_0001_ECX[LWP] = 0. 
#UD Xx Xx The system is not in protected mode. 
Xx LWP is not available, or mod != 11b, or vwwwv != 1111b. 
X Any part of the LWPCB or the event ring buffer is beyond the 
peg protection, DS segment limit. 
Xx Any restrictions on the contents of the LWPCB are violated 
Xx A page fault resulted from reading or writing the LWPCB. 
X LWP was already enabled and a page fault resulted from 
Page fault, #PF reading or writing the old LWPCB. 
Xx LWP was already enabled and a page fault resulted from 
flushing an event to the old ring buffer. 
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LODS Load String 
LODSB 

LODSW 

LODSD 

LODSQ 

Copies the byte, word, doubleword, or quadword in the memory location pointed to by the DS:rSI 


registers to the AL, AX, EAX, or RAX register, depending on the size of the operand, and then 
increments or decrements the rSI register according to the state of the DF flag in the rFLAGS register. 


If the DF flag is 0, the instruction increments rSI; otherwise, it decrements rSI. It increments or 
decrements rSI by 1, 2, 4, or 8, depending on the number of bytes being loaded. 


The forms of the LODS instruction with an explicit operand address the operand at seg:[rSI]. The 
value of seg defaults to the DS segment, but may be overridden by a segment prefix. The explicit 
operand serves only to specify the type (size) of the value being copied and the specific registers used. 


The no-operands forms of the instruction always use the DS:[rSI] registers to point to the value to be 
copied (they do not allow a segment prefix). The mnemonic determines the size of the operand and the 
specific registers used. 


The LODSx instructions support the REP prefixes. For details about the REP prefixes, see “Repeat 
Prefixes” on page 12. More often, software uses the LODSx instruction inside a loop controlled by a 
LOOPcc instruction as a more efficient replacement for instructions like: 


mov eax, dword ptr ds:[esi] 
add esi, 4 


The LODSQ instruction can only be used in 64-bit mode. 


Mnemonic Opcode Description 


Load byte at DS:rSI into AL and then increment or 


LODS mem8 AC decrement rSl. 


Load word at DS:rSI into AX and then increment or 


LODS mem16 AD decrement rSl. 

LODS mem32 AD ienemecuene 
pened AD pee aera ct DS:rSlI into RAX and then increment 
once AC Facies orig into AL and then increment or 
vanew AD Load the word at DS:rSI into AX and then increment or 


decrement rSl. 
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Mnemonic Opcode Description 
Load doubleword at DS:rSI into EAX and then 
LOpSp AD increment or decrement rSl. 
Load quadword at DS:rSI into RAX and then increment 
LODSQ oe or decrement rSl. 
Related Instructions 
MOVSx, STOSx 
rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 | Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #38 ‘ - . non-canonical. 
Xx Xx Xx Amemory address exceeded a data segment limit or was non- 

General protection, canonical. 

#GP 

Xx A null data segment was used to reference memory. 

Page fault, #PF x x A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 


#AC 


alignment checking was enabled. 
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LOOP Loop 
LOOPE 

LOOPNE 

LOOPNZ 

LOOPZ 

Decrements the count register (rCX) by 1, then, if rCX is not 0 and the ZF flag meets the condition 


specified by the mnemonic, it jumps to the target instruction specified by the signed 8-bit relative 
offset. Otherwise, it continues with the next instruction after the LOOPcc instruction. 


The size of the count register used (CX, ECX, or RCX) depends on the address-size attribute of the 
LOOPcc instruction. 


The LOOP instruction ignores the state of the ZF flag. 


The LOOPE and LOOPZ instructions jump if rCX is not 0 and the ZF flag is set to 1. In other words, 
the instruction exits the loop (falls through to the next instruction) if rCX becomes 0 or ZF = 0. 


The LOOPNE and LOOPNZ instructions jump if rCX is not 0 and ZF flag is cleared to 0. In other 
words, the instruction exits the loop if rCX becomes 0 or ZF = 1. 


The LOOPcc instruction does not change the state of the ZF flag. Typically, the loop contains a 
compare instruction to set or clear the ZF flag. 


If the jump is taken, the signed displacement is added to the rIP (of the following instruction) and the 
result is truncated to 16, 32, or 64 bits, depending on operand size. 


In 64-bit mode, the operand size defaults to 64 bits without the need for a REX prefix, and the 
processor sign-extends the 8-bit offset before adding it to the RIP. 


Mnemonic Opcode Description 
LOOP rel8off E2 cb Decrement rCX, then jump short if rCX is not 0. 
LOOPE rel8off E1 cb oe rCX, then jump short if rCX is not 0 and ZF is 
LOOPNE rel8off EO cb ao rCX, then Jump short if rCX is not O and ZF 
LOOPNZ rel8off EO cb em rCX, then Jump short if rCX is not 0 and ZF 
LOOPZ rel8off E1 cb a rCX, then Jump short if rCX is not 0 and ZF 


Related Instructions 


None 
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None 
Exceptions 
Virtual 
8086 /Protected Cause of Exception 
oa protection, The target offset exceeded the code segment limit or was non- 
canonical. 
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LWPINS Lightweight Profiling Insert Record 


Inserts programmed event record into the LWP event ring buffer in memory and advances the ring 
buffer pointer. 


Refer to the description of the programmed event record in Volume 2, Chapter 13. The record has an 
Eventld of 255. The value in the register specified by vvvv (first operand) is stored in the Data2 field at 
bytes 23-16 (zero extended if the operand size is 32). The value in a register or memory location 
(second operand) is stored in the Data1 field at bytes 7-4. The immediate value (third operand) is 
truncated to 16 bits and stored in the Flags field at bytes 3-2. 


If the ring buffer is not full, or if LWP is running in Continuous Mode, the head pointer is advanced 
and the CF flag is cleared. If the ring buffer threshold is exceeded and threshold interrupts are enabled, 
an interrupt is signaled. If LWP is in Continuous Mode and the new head pointer equals the tail pointer, 
the MissedEvents counter is incremented to indicate that the buffer wrapped. 


If the ring buffer is full and LWP is running in Synchronized Mode, the event record overwrites the last 
record in the buffer, the MissedEvents counter in the LWPCB is incremented, the head pointer is not 
advanced, and the CF flag is set. 


LWPINS generates an invalid opcode exception (#UD) if the machine is not in protected mode or if 
LWP is not available. 


LWPINS simply clears CF if LWP is not enabled. This allows LWPINS instructions to be harmlessly 
ignored if profiling is turned off. 


It is possible to execute LWPINS when the CPL # 3 or when SMM is active, but the system software 
must ensure that the memory operand (if present), the LWPCB, and the entire ring buffer are properly 
mapped into writable memory in order to avoid a #PF or #GP fault. Using LWPINS in these situations 
is not recommended. 


LWPINS can be used by a program to mark significant events in the ring buffer as they occur. For 
instance, a program might capture information on changes in the process’ address space such as library 
loads and unloads, or changes in the execution environment such as a change in the state of a user- 
mode thread of control. 


Note that when the LWPINS instruction finishes writing a event record in the event ring buffer, it 
counts as an instruction retired. If the Instructions Retired event is active, this might cause that counter 
to become negative and immediately store another event record with the same instruction address (but 
different EventId values). 


LWPINS is an LWP instruction. Support for LWP instructions is indicated by CPUID 
Fn8000_0001_ECX[LWP] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 
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Instruction Encoding 


Mnemonic Encoding 
XOP RXB.map_select W.vvvv.L.pp Opcode 
LWPINS reg32. ww, reg/mem32, imm32 8F  RXB.0A 0.src1.0.00 12 /0 /imm32 
LWPINS reg64. ww, reg/mem32, imm32 8F RXB.0A 1.src1.0.00 12 /0 /imm32 


ModRM. reg augments the opcode and is assigned the value 0. The {mod, r/m} field of the ModRM 
byte (augmented by XOP.R) encodes the second operand. A 4-byte immediate field follows ModRM. 


Related Instructions 


LLWPCB, SLWPCB, LWPVAL 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 


M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to O is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
: Virtual : 
Exception Real! 8086 |Protected Cause of Exception 
X X X LWP instructions are not supported, as indicated by CPUID 
Invalid opcode Fn8000_0001_ECX[LWP] = 0. 
#UD Xx Xx The system is not in protected mode. 
Xx LWP is not available. 
Xx A page fault resulted from reading or writing the LWPCB. 
Page fault, #PF Xx A page fault resulted from writing the event to the ring buffer. 
X A page fault resulted from reading a modrm operand from 
memory. 
reg pisteenen Xx A modrm operand in memory exceeded the segment limit. 
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LWPVAL Lightweight Profiling Insert Value 


Decrements the event counter associated with the programmed value sample event (see “Programmed 
Value Sample” in Volume 2, Chapter 13). If the resulting counter value is negative, inserts an event 
record into the LWP event ring buffer in memory and advances the ring buffer pointer. 


Refer to the description of the programmed value sample record in Volume 2, Chapter 13. The event 
record has an Eventld of 1. The value in the register specified by vvvv (first operand) is stored in the 
Data2 field at bytes 23-16 (zero extended if the operand size is 32). The value in a register or memory 
location (second operand) is stored in the Datal field at bytes 7-4. The immediate value (third 
operand) is truncated to 16 bits and stored in the Flags field at bytes 3—2. 


If the programmed value sample record is not written to the event ring buffer, the memory location of 
the second operand (assuming it is memory-based) is not accessed. 


If the ring buffer is not full or if LWP is running in continuous mode, the head pointer is advanced and 
the event counter is reset to the interval for the event (subject to randomization). If the ring buffer 
threshold is exceeded and threshold interrupts are enabled, an interrupt is signaled. If LWP is in 
Continuous Mode and the new head pointer equals the tail pointer, the MissedEvents counter is 
incremented to indicate that the buffer wrapped. 


If the ring buffer is full and LWP is running in Synchronized Mode, the event record overwrites the last 
record in the buffer, the MissedEvents counter in the LWPCB is incremented, and the head pointer is 
not advanced. 


LWPVAL generates an invalid opcode exception (#UD) if the machine is not in protected mode or if 
LWP is not available. 


LWPVAL does nothing if LWP is not enabled or if the Programmed Value Sample event is not enabled 
in LWPCB.Flags. This allows LWPVAL instructions to be harmlessly ignored if profiling is turned off. 


It is possible to execute LWPVAL when the CPL != 3 or when SMM is active, but the system software 
must ensure that the memory operand (if present), the LWPCB, and the entire ring buffer are properly 
mapped into writable memory in order to avoid a #PF or #GP fault. Using LWPVAL in these situations 
is not recommended. 


LWPVAL can be used by a program to perform value profiling. This is the technique of sampling the 
value of some program variable at a predetermined frequency. For example, a managed runtime might 
use LWPVAL to sample the value of the divisor for a frequently executed divide instruction in order to 
determine whether to generate specialized code for a common division. It might sample the target 
location of an indirect branch or call to see if one destination is more frequent than others. Since 
LWPVAL does not modify any registers or condition codes, it can be inserted harmlessly between any 
instructions. 
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Note 


When LWPVAL completes (whether or not it stored an event record in the event ring buffer), it counts 
as an instruction retired. If the Instructions Retired event is active, this might cause that counter to 
become negative and immediately store an event record. If LWPVAL also stored an event record, the 
buffer will contain two records with the same instruction address (but different EventId values). 


LWPVAL is an LWP instruction. Support for LWP instructions is indicated by CPUID 
Fn8000_0001_ECX[LWP] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Instruction Encoding 


Mnemonic Encoding 
XOP RXB.map_select W.vvvv.L.pp Opcode 
LWPVAL reg32.vww, reg/mem32, imm32 8F  RXB.0A 0.src1.0.00 12 /1 fimm32 
LWPVAL reg64.vvw, reg/mem32, imm32 8F  RXB.0A 1.src1.0.00 12 /1 fimm32 


ModRM.reg augments the opcode and is assigned the value 001b. The {mod, r/m} field of the 
ModRM byte (augmented by XOP.R) encodes the second operand. A four-byte immediate field 
follows ModRM. 


Related Instructions 


LLWPCB, SLWPCB, LWPINS 


rFLAGS Affected 


None 
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Exceptions 
‘ Virtual : 
Exception Real! 8086 [Protected Cause of Exception 
X X X LWP instructions are not supported, as indicated by CPUID 
Invalid opcode, Fn8000_0001_ECX[LWP] = 0. 
#UD Xx Xx The system is not in protected mode. 
x LWP is not available. 
Xx A page fault resulted from reading or writing the LWPCB. 
Page fault, #PF Xx A page fault resulted from writing the event to the ring buffer. 
X A page fault resulted from reading a modrm operand from 
memory. 
ra pioteetan: Xx A modrm operand in memory exceeded the segment limit. 
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LZCNT Count Leading Zeros 


Counts the number of leading zero bits in the 16-, 32-, or 64-bit general purpose register or memory 
source operand. Counting starts downward from the most significant bit and stops when the highest bit 
having a value of 1 is encountered or when the least significant bit is encountered. The count is written 
to the destination register. 


This instruction has two operands: 
LZCNT dest, src 


If the input operand is zero, CF is set to 1 and the size (in bits) of the input operand is written to the 
destination register. Otherwise, CF is cleared. 


If the most significant bit is a one, the ZF flag is set to 1, zero is written to the destination register. 
Otherwise, ZF is cleared. 


LZCNT is an Advanced Bit Manipulation (ABM) instruction. Support for the LZCNT instruction is 
indicated by CPUID Fn8000_0001_ECX[ABM] = 1. If the LZCNT instruction is not available, the 
encoding is interpreted as the BSR instruction. Software MUST check the CPUID bit once per 
program or library initialization before using the LZCNT instruction, or inconsistent behavior may 
result. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Opcode Description 
LZCNT reg16, reg/mem16 F3 OF BD /r Count the number of leading zeros in reg/mem16. 
LZCNT  reg32, reg/mem32 F3 OF BD /r Count the number of leading zeros in reg/mem32. 
LZCNT  reg64, reg/mem64 F3 OF BD /r Count the number of leading zeros in reg/mem64. 


Related Instructions 


ANDN, BEXTR, BLCI, BLCIC, BLCMSK, BLCS, BLSFILL, BLSI, BLSIC, BLSR, BLSMSK, BSF, 
BSR, POPCNT, TIMSKC, TZCNT, TZMSK 
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rFLAGS Affected 
wip [vi |ac |v | RF | NT | topL | OF | DF | iF | TF | SF | ZF | AF | PF | 
U U | M|U 4} U 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


ID CF 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to O is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Mode 
Exception Virtual Cause of Exception 
Real| 8086 |Protected 
A memory address exceeded the stack segment limit or 
Stack, #35 . : ‘S was non-canonical. 
Xx Xx Xx A memory address exceeded a data segment limit or was 

General protection, non-canonical. 

#GP 

x A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
. An unaligned memory reference was performed while 

Alignment check, #AC ‘s - alignment checking was enabled. 
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MCOMMIT Commit Stores to Memory 


MCOMMIT provides a fencing and error detection capability for stores to system memory 
components that have delayed error reporting. Execution of MCOMMIT ensures that any preceding 
stores in the thread to such memory components have completed (target locations written, unless 
inhibited by an error condition) and that any errors encountered by those stores have been signaled to 
associated error logging resources. If any such errors are present, MCOMMIT will clear rFLAGS.CF 
to zero, otherwise it will set rFLAGS.CF to one. 


These errors are specific to the design of the platform and are reported only via MCOMMIT and in 
associated error logging registers on the platform; they are not visible to the Machine Check 
Architecture. Execution of MCOMMIT does not change any state in the error logging resources. Any 
error indications will need to be cleared by privileged software before MCOMMIT can return an error- 
free indication. Details on the error logging mechanisms may be found in the Processor Programming 
Reference manual for any product that supports this technology and the MCOMMIT instruction. 


The MCOMMIT instruction is supported if the feature flag CPUID Fn8000_0008_EBX[MCOMMIT] 
=1 (bit 8). The MCOMMIT instruction must be explicitly enabled by the OS by setting 
EFER.MCOMMIT=1 (EFER bit 17), otherwise attempted execution of MCOMMIT will result in a 
#UD exception. 


MCOMM IT uses the same ordering rules as the SFENCE instruction. It may be executed at any 
privilege level. 


Instruction Encoding 


Mnemonic Opcode Description 


MCOMMIT F3 OF 01 FA Commit stores to memory 


Related Instructions 


LFENCE, SFENCE, MFENCE 


rFLAGS Affected 
wip [vie |ac [vm | RF | NT | topL | OF | DF | iF | TF | SF | ZF | AF | PF | 
0 ) 0 ) 0) 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


ID CF 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 
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MFENCE Memory Fence 


Acts as a barrier to force strong memory ordering (serialization) between load and store instructions 
preceding the MFENCE, and load and store instructions that follow the MFENCE. The processor may 
perform loads out of program order with respect to non-conflicting stores for certain memory types. 
The MFENCE instruction ensures that the system completes all previous memory accesses before 
executing subsequent accesses. 


The MFENCE instruction is weakly-ordered with respect to data and instruction prefetches. 
Speculative loads initiated by the processor, or specified explicitly using cache-prefetch instructions, 
can be reordered around an MFENCE. 


In addition to load and store instructions, the MFENCE instruction is strongly ordered with respect to 
other MFENCE instructions, LFENCE instructions, SFENCE instructions, serializing instructions, 
and CLFLUSH instructions. Further details on the use of MFENCE to order accesses among differing 
memory types may be found in AMD64 Architecture Programmer’s Manual Volume 2: System 
Programming, see 7.4 “Memory Types” on page 172. 


The MFENCE instruction is a serializing instruction. 


MFENCE is an SSE2 instruction. Support for SSE2 instructions is indicated by CPUID 
Fn0000_0001_EDX[SSE2] = 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Instruction Encoding 


Mnemonic Opcode Description 
Force strong ordering of (serialized) load and store 
MFENCE OF AE FO operations. 


Related Instructions 


LFENCE, SFENCE, MCOMMIT 


rFLAGS Affected 


None 


Exceptions 


Virtual 
8086 /Protected Cause of Exception 


ae opcode, SSE2 instructions are not ee as indicated by CPUID 
Fn0000_0001_EDX[SSE2] = 
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MONITORX Setup Monitor Address 


Establishes a linear address range of memory for hardware to monitor and puts the processor in the 
monitor event pending state. When in the monitor event pending state, the monitoring hardware 
detects stores to the specified linear address range and causes the processor to exit the monitor event 
pending state. The MWAIT and MWAITX instructions use the state of the monitor hardware. 


The address range should be a write-back memory type. Executing MONITORX on an address range 
for a non-write-back memory type is not guaranteed to cause the processor to enter the monitor event 
pending state. The size of the linear address range that is established by the MONITORX instruction 
can be determined by CPUID function 0000_0005h. 


The rAX register provides the effective address. The DS segment is the default segment used to create 
the linear address. Segment overrides may be used with the MONITORX instruction. 


The ECX register specifies optional extensions for the MONITORX instruction. There are currently 
no extensions defined and setting any bits in ECX will result in a #GP exception. The ECX register 
operand is implicitly 32-bits. 


The EDX register specifies optional hints for the MONITORX instruction. There are currently no 
hints defined and EDX is ignored by the processor. The EDX register operand is implicitly 32-bits. 


The MONITORX instruction can be executed at any privilege level and MSR 
C001_0015h[MonMwaitUserEn] has no effect on MONITORX. 


MONITORX performs the same segmentation and paging checks as a 1-byte read. 


Support for the MONITORX instruction is indicated by CPUID Fn8000_0001_ECX[MONITORX] 
(bit 29) = 1. 


Software must check the CPUID bit once per program or library initialization before using the 
MONITOR instruction, or inconsistent behavior may result. 


The following pseudo-code shows typical usage of aMONITORX/MWAITX pair: 


EAX = Linear_Address_to_Monitor; 
ECX = 0; // Extensions 
EDX = 0; // Hints 


while (!matching_store_done) { 
MONITORX EAX, ECX, EDX 
IF (!matching_store_done) { 
MWAITX EAX, ECX 
I 
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Mnemonic Opcode Description 
MONITORX OF 01 FA Establishes a range to be monitored 
Related Instructions 
MWAITX, MONITOR, MWAIT 
rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 

Invalid opcode, Xx Xx Xx MONITORX/MWAITX instructions are not supported, as 

#UD indicated by CPUID Fn8000_0001_ECX[MONITORX] =0 
A memory address exceeded the stack segment limit or was 
Stack, #SS a ms - non-canonical. 
A memory address exceeded a data segment limit or was non- 
X X X 
; canonical 
General protection, 
#GP xX xX Xx ECX was non-zero 
Xx A null data segment was used to reference memory 
Page Fault, #PF Xx Xx A page fault resulted from the execution of the instruction 
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MOV Move 


Copies an immediate value or the value in a general-purpose register, segment register, or memory 
location (second operand) to a general-purpose register, segment register, or memory location. The 
source and destination must be the same size (byte, word, doubleword, or quadword) and cannot both 
be memory locations. 


In opcodes AO through A3, the memory offsets (called moffsets) are address sized. In 64-bit mode, 
memory offsets default to 64 bits. Opcodes AO—A3, in 64-bit mode, are the only cases that support a 
64-bit offset value. (In all other cases, offsets and displacements are a maximum of 32 bits.) The B8 
through BF (B8 +rq) opcodes, in 64-bit mode, are the only cases that support a 64-bit immediate value 
(in all other cases, immediate values are a maximum of 32 bits). 


When reading segment-registers with a 32-bit operand size, the processor zero-extends the 16-bit 
selector results to 32 bits. When reading segment-registers with a 64-bit operand size, the processor 
zero-extends the 16-bit selector to 64 bits. If the destination operand specifies a segment register (DS, 
ES, FS, GS, or SS), the source operand must be a valid segment selector. 


It is possible to move a null segment selector value (O000—0003h) into the DS, ES, FS, or GS register. 
This action does not cause a general protection fault, but a subsequent reference to such a segment 
does cause a #GP exception. For more information about segment selectors, see “Segment Selectors 
and Registers” in Volume 2. 


When the MOV instruction is used to load the SS register, the processor blocks external interrupts until 
after the execution of the following instruction. This action allows the following instruction to be a 
MOV instruction to load a stack pointer into the ESP register (MOV ESP, val) before an interrupt 
occurs. However, the LSS instruction provides a more efficient method of loading SS and ESP. 


Attempting to use the MOV instruction to load the CS register generates an invalid opcode exception 
(#UD). Use the far JMP, CALL, or RET instructions to load the CS register. 


To initialize a register to 0, rather than using a MOV instruction, it may be more efficient to use the 
XOR instruction with identical destination and source operands. 


Mnemonic Opcode Description 


Move the contents of an 8-bit register to an 8-bit 


MOV reg/memé, reg8 BB /r destination register or memory operand. 


Move the contents of a 16-bit register to a 16-bit 
MOV reg/mem16, regi6 BO /r destination register or memory operand. 


Move the contents of a 32-bit register to a 32-bit 
MOW reg/mems2; i292 BO /r destination register or memory operand. 


Move the contents of a 64-bit register to a 64-bit 
MOV reg/mem64, reg64 BQ /r destination register or memory operand. 


Move the contents of an 8-bit register or memory 
MOV 1200, FeQUnemns Brae operand to an 8-bit destination register. 
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Mnemonic Opcode Description 
MOV regi6, reg/mem16 8B /r Sperand to a 16-bit destination register, 
MOV reg32, reg/mem32 8B / pperand?s agz-endestnaion tester: 
MOV reg64, reg/mem6-4 8B /r Sperand to a 64-bit destination register, 
MOV real6K32/64imem28, ac ye ee et 8 Moa 
operand. 
MOV segfReg,reginemis «ge /r_—=«=—=—‘Mowe the contents of bit registeror memory 
MOV AL, moffset8 AO eae data at a specified memory offset to the AL 
MOV AX, moffset16 Al ee data at a specified memory offset to the AX 
MOV EAX, moffset32 Al edcer 7 at a specified memory offset to the 
MOV RAX, moffset64 Al eee at a specified memory offset to the 
MOV moffset8, AL A2 ae contents of the AL register to an 8-bit memory 
MOV moffset16, AX A3 aoe contents of the AX register to a 16-bit memory 
MOV moffset32, EAX A3 ore of the EAX register to a 32-bit 
MOV moffset64, RAX A3 sa en al of the RAX register to a 64-bit 
MOV reg8, imm8& BO +rb ib Move an 8-bit immediate value into an 8-bit register. 
MOV regi6, imm16 B8 +rw iw Move a 16-bit immediate value into a 16-bit register. 
MOV reg32, imm32 B8 +rd id Move an 32-bit immediate value into a 32-bit register. 
MOV reg64, imm64 B8 +rq iq Move an 64-bit immediate value into a 64-bit register. 
MOV reg/mems8, imm8 C6 /0 ib Move an 8-bit immediate value to an 8-bit register or 


memory operand. 


: f Move a 16-bit immediate value to a 16-bit register or 
MOV reg/mem16, imm16 C7 /0 iw memory operand. 


. . Move a 32-bit immediate value to a 32-bit register or 
MOV reg/mem32, imm32 C7 /0 id memory operand. 


: . Move a 32-bit signed immediate value to a 64-bit 
MOV reg/mem64, imm32 C7 /0 id register or memory operand. 


234 : General-Purpose 
[AM D Pu blic Use] Instruction Reference 


AMDi¢1 


24594—Rev. 3.33—November 2021 


Related Instructions 


AMD64 Technology 


MOV CRn, MOV DRn, MOVD, MOVSX, MOVZX, MOVSXD, MOVSx 


rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 | Protected Cause of Exception 
ee opcode, x Xx x An attempt was made to load the CS register. 
Steuer Xx The DS, ES, FS, or GS register was loaded with a non-null 
ecieciap segment selector and the segment was marked not present. 
A memory address exceeded the stack segment limit or was 
Stack, #SS ‘ ‘ 7 non-canonical. 
Stack, #SS Xx The SS register was loaded with a non-null segment selector, 
(selector) and the segment was marked not present. 
A memory address exceeded a data segment limit or was non- 
X X X 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Xx A segment register was loaded, but the segment descriptor 
exceeded the descriptor table limit. 
Xx A segment register was loaded and the segment selector’s TI 
bit was set, but the LDT selector was a null selector. 
Xx The SS register was loaded with a null segment selector in 
non-64-bit mode or while CPL = 3. 
General protection, Xx The SS register was loaded and the segment selector RPL 
#GP and the segment descriptor DPL were not equal to the CPL. 
eeecloD) Xx The SS register was loaded and the segment pointed to was 
not a writable data segment. 
The DS, ES, FS, or GS register was loaded and the segment 
Xx pointed to was a data or non-conforming code segment, but 
the RPL or CPL was greater than the DPL. 
Xx The DS, ES, FS, or GS register was loaded and the segment 
pointed to was not a data segment or readable code segment. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 


#AC 


alignment checking was enabled. 
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MOVBE Move Big Endian 


Loads or stores a general purpose register while swapping the byte order. Operates on 16-bit, 32-bit, or 
64-bit values. Converts big-endian formatted memory data to little-endian format when loading a 
register and reverses the conversion when storing a GPR to memory. 


The load form reads a 16-, 32-, or 64-bit value from memory, swaps the byte order, and places the 
reordered value in a general-purpose register. When the operand size is 16 bits, the upper word of the 
destination register remains unchanged. In 64-bit mode, when the operand size is 32 bits, the upper 
doubleword of the destination register is cleared. 


The store form takes a 16-, 32-, or 64-bit value from a general-purpose register, swaps the byte order, 
and stores the reordered value in the specified memory location. The contents of the source GPR 
remains unchanged. 


In the 16-bit swap, the upper and lower bytes are exchanged. In the doubleword swap operation, bits 
7:0 are exchanged with bits 31:24 and bits 15:8 are exchanged with bits 23:16. In the quadword swap 
operation, bits 7:0 are exchanged with bits 63:56, bits 15:8 with bits 55:48, bits 23:16 with bits 47:40, 
and bits 31:24 with bits 39:32. 


Support for the MOVBE instruction is indicated by CPUID Fn0000_0001_ECX[MOVBE] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Instruction Encoding 


Mnemonic Opcode Description 


Load the low word of a general-purpose register from a 
MOVBE 0 7G, mem ne OF 220 16-bit memory location while swapping the bytes. 


Load the low doubleword of a general-purpose register 
MOVER Te g22 iuemnse OF SB FO 1 from a 32-bit memory location while swapping the bytes. 


Load a 64-bit register from a 64-bit memory location 
MOVBE reg64, mem64 OF 38 FO /r while swapping the bytes. 


Store the low word of a general-purpose register to a 
MOVBE memis. 72018 Draer Li 16-bit memory location while swapping the bytes. 


Store the low doubleword of a general-purpose register 
MOVIE Weide, }eu32 Oreo rly to a 32-bit memory location while swapping the bytes. 


Store the contents of a 64-bit general-purpose register 
MOMVBE memGd,1egG4 OF 38 F1/r to a 64-bit memory location while swapping the bytes. 


Related Instruction 


BSWAP 
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rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 | Protected Cause of Exception 
Invalid opcode, Xx Xx Xx Instruction not supported as indicated by CPUID 
#UD Fn0000_0001_ECX[MOVBE] = 0. 
A memory address exceeded the stack segment limit or was non- 
Stack, #SS X X X eariomicah 
A memory address exceeded a data segment limit or was non- 
X X X 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while alignment 


#AC 


checking was enabled. 
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MOVD Move Doubleword or Quadword 


Moves a 32-bit or 64-bit value in one of the following ways: 

¢ from a 32-bit or 64-bit general-purpose register or memory location to the low-order 32 or 64 bits 
of an XMM register, with zero-extension to 128 bits 

e from the low-order 32 or 64 bits of an XMM to a 32-bit or 64-bit general-purpose register or 
memory location 

¢ from a 32-bit or 64-bit general-purpose register or memory location to the low-order 32 bits (with 
zero-extension to 64 bits) or the full 64 bits of an MMX register 


e from the low-order 32 or the full 64 bits of an MMX register to a 32-bit or 64-bit general-purpose 
register or memory location 


Figure 3-1 on page 239 illustrates the operation of the MOVD instruction. 


The MOVD instruction form that moves data to or from MMxX registers is part of the MMX instruction 
subset. Support for MMX instructions is indicated by CPUID Fn0000_0001_EDX[MMX] or 
Fn0000_0001_EDX[MMX] = 1. 


The MOVD instruction form that moves data to or from XMM registers is part of the SSE2 instruction 
subset. Support for SSE2 instructions is indicated by CPUID Fn0000_0001_EDX[SSE2] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 
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xmm reg/mem32 
127 3231 ¥ 0 31 0 
| 
xmm reg/mem64 
127 64 63 t 0 63 0 
| 


with REX prefix 
reg/mem32 xmm 
All operations 37 Vv 0 127 32 31 0 
a OC 
| 
reg/mem64 xmm 
63 y 0 127 64 63 0 
with REX prefix 
mmx reg/mem32 
63 3231 ¥ 0 31 0 
mmx reg/mem64 
63 0 63 0 
with REX prefix 
reg/mem32 mmx 
31 v0 63 32 31 0 
| 
reg/mem64 mmx 
63 0 63 0 
with REX prefix 


Figure 3-1. MOVD Instruction Operation 
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Instruction Encoding 


Mnemonic 


MOVD xmm, reg/imem32 


MOVD? xmm, regimem64 


MOVD reg/mem32, xmm 


MovD1 regimem64, xmm 


MOVD mmx, regimem32 


MOVD mmx, regimem64 


MOVD reg/mem32, mmx 


MOVD reg/mem64, mmx 


24594—Rev. 3.33—November 2021 


Opcode Description 


66 OF 6E /r 


66 OF 6E /r 


66 OF 7E/r 


66 OF 7E/r 


OF 6E /r 


OF 6E /r 


OF 7E/r 


OF 7E/r 


Move 32-bit value from a general-purpose register or 
32-bit memory location to an XMM register. 


Move 64-bit value from a general-purpose register or 
64-bit memory location to an XMM register. 


Move 32-bit value from an XMM register to a 32-bit 
general-purpose register or memory location. 


Move 64-bit value from an XMM register to a 64-bit 
general-purpose register or memory location. 


Move 32-bit value from a general-purpose register or 
32-bit memory location to an MMX register. 


Move 64-bit value from a general-purpose register or 
64-bit memory location to an MMX register. 


Move 32-bit value from an MMX register to a 32-bit 
general-purpose register or memory location. 


Move 64-bit value from an MMX register to a 64-bit 
general-purpose register or memory location. 


Note: 1. Also known as MOVQ in some developer tools. 


Related Instructions 


MOVDQA, MOVDQU, MOVDQ2Q, MOVQ, MOVQ2DQ 


rFLAGS Affected 


None 


MXCSR Flags Affected 


None 
Exceptions 
Virtual 
Exception Real | 8086 | Protected Description 
MMxX instructions are not supported, as indicated by 
X X X CPUID Fn0000_0001_EDX[MMxX] or 
Fn0000_0001_EDX[MMX] = 0. 
Xx Xx Xx SSE2 instructions are not supported, as indicated by 
Invalid opcode, #UD CPUID Fn0000_0001_EDX[SSE2] = 0. 
Xx Xx x The emulate bit (EM) of CRO was set to 1. 
Xx Xx Xx The instruction used XMM registers while 
CR4.0SFXSR = 0. 
paras not available, Xx Xx Xx The task-switch bit (TS) of CRO was set to 1. 
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Virtual 
Exception Real | 8086 | Protected Description 
A memory address exceeded the stack segment limit 
Stack, #38 ‘ 7 . or was non-canonical. 
General protection, Xx Xx Xx A memory address exceeded a data segment limit or 
#GP was non-canonical. 
Page fault, #PF Xx Xx alee toa resulted from the execution of the 
ee canon Xx Xx Xx An x87 floating-point exception was pending and the 
oe. p 9; instruction referenced an MMX register. 
An unaligned memory reference was performed while 
AlgniTient Check #AG - as alignment checking was enabled. 
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Extract Packed Double-Precision 
Floating-Point Sign Mask 


Moves the sign bits of two packed double-precision floating-point values in an XMM register (Second 
operand) to the two low-order bits of a general-purpose register (first operand) with zero-extension. 


The function of the MOVMSKPD instruction is illustrated by the diagram below: 


31 


xmm 
1 0 127 63 0 
copy sign | 


| copy sign 


movmskpd.eps 


The MOVMSKPD instruction is an SSE2 instruction. Support for SSE2 instructions is indicated by 
CPUID Fn0000_0001_EDX[SSE2] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Instruction Encoding 


Mnemonic 


MOVMSKPD reg32, xmm 


Related Instructions 


MOVMSKPS, PMOVMSKB 


rFLAGS Affected 


None 


MXCSR Flags Affected 


None 


Opcode Description 


Move sign bits 127 and 63 in an XMM register to a 32-bit 


66 OF 50 /r general-purpose register. 
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Exceptions 
Virtual 
Exception (vector) | Real | 8086 | Protected Cause of Exception 
Xx Xx Xx SSE2 instructions are not supported, as indicated by 
CPUID Fn0000_0001_EDX[SSE2] = 0. 
Invalid opcode, #UD x x x The operating-system FXSAVE/FXRSTOR support bit 
(OSFXSR) of CR4 was cleared to 0. 
Xx x x The emulate bit (EM) of CRO was set to 1. 
pa not available, xX Xx Xx The task-switch bit (TS) of CRO was set to 1. 
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MOVMSKPS Extract Packed Single-Precision 
Floating-Point Sign Mask 


Moves the sign bits of four packed single-precision floating-point values in an XMM register (second 
operand) to the four low-order bits of a general-purpose register (first operand) with zero-extension. 


The MOVMSKPD instruction is an SSE2 instruction. Support for SSE2 instructions is indicated by 
CPUID Fn0000_0001_EDX[SSE2] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Opcode Description 


Move sign bits 127, 95, 63, 31 in an XMM register to a 


MMM SIS 1a aeae in OF 50 /r 32-bit general-purpose register. 
reg32 xmm 
31 3 0 127 95 63 31 0 


copy sign copy sign copy sign copy sign 


movmskps.eps 


Related Instructions 


MOVMSKPD, PMOVMSKB 


rFLAGS Affected 


None 


MXCSR Flags Affected 


None 
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Exceptions 
Virtual 
Exception Real | 8086 | Protected Cause of Exception 
Xx Xx Xx SSE2 instructions are not supported, as indicated by 
CPUID Fn0000_0001_EDX[SSE2] = 0. 
Invalid opcode, #UD x x x The operating-system FXSAVE/FXRSTOR support bit 
(OSFXSR) of CR4 was cleared to 0. 
x x x The emulate bit (EM) of CRO was set to 1. 
pa not available, Xx Xx Xx The task-switch bit (TS) of CRO was set to 1. 
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MOVNTI Move Non-Temporal Doubleword or 
Quadword 


Stores a value in a 32-bit or 64-bit general-purpose register (second operand) in a memory location 
(first operand). This instruction indicates to the processor that the data is non-temporal and is unlikely 
to be used again soon. The processor treats the store as a write-combining (WC) memory write, which 
minimizes cache pollution. The exact method by which cache pollution is minimized depends on the 
hardware implementation of the instruction. For further information, see “Memory Optimization” in 
Volume 1. 


The MOVNTI instruction is weakly-ordered with respect to other instructions that operate on memory. 
Software should use an SFENCE instruction to force strong memory ordering of MOVNTI with 
respect to other stores. 


The MOVNTI instruction is an SSE2 instruction. Support for SSE2 instructions is indicated by 
CPUID Fn0000_0001_EDX[SSE2] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Opcode Description 


Stores a 32-bit general-purpose register value into a 32- 


MOMNITIMEN a2 fe032 Gade bit memory location, minimizing cache pollution. 


Stores a 64-bit general-purpose register value into a 64- 
MOVIN Nene e004 DE-Goit bit memory location, minimizing cache pollution. 


Related Instructions 


MOVNTDQ, MOVNTPD, MOVNTPS, MOVNTQ 


rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception (vector) | Real | 8086 | Protected Cause of Exception 
SSE2 instructions are not supported, as indicated by 
invalid opcode, #UD =| X Z X | CPUID Fn0000_0001_ EDX[SSE2] = 0. 


A memory address exceeded the stack segment limit 
Stack, #SS x x or was non-canonical. 
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Virtual 
Exception (vector) | Real | 8086 | Protected Cause of Exception 
Xx Xx Xx A memory address exceeded a data segment limit or 
was non-canonical. 
a piotechen, Xx A null data segment was used to reference memory. 
Xx The destination operand was in a non-writable 
segment. 
A page fault resulted from the execution of the 
Page fault, #PF x x aeuetore 
Alignment check, #AC Xx Xx An unaligned memory reference was performed while 


alignment checking was enabled. 
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MOVS Move String 
MOVSB 

MOVSW 

MOVSD 

MOVSQ 

Moves a byte, word, doubleword, or quadword from the memory location pointed to by DS:rSI to the 


memory location pointed to by ES:rDI, and then increments or decrements the rSI and rDI registers 
according to the state of the DF flag in the rFLAGS register. 


If the DF flag is 0, the instruction increments both pointers; otherwise, it decrements them. It 
increments or decrements the pointers by 1, 2, 4, or 8, depending on the size of the operands. 


The forms of the MOVSx instruction with explicit operands address the first operand at seg:[rSI]. The 
value of seg defaults to the DS segment, but can be overridden by a segment prefix. These instructions 
always address the second operand at ES:[rDI] (ES may not be overridden). The explicit operands 
serve only to specify the type (size) of the value being moved. 


The no-operands forms of the instruction use the DS:[rSI] and ES:[rDI] registers to point to the value 
to be moved (they do not allow a segment prefix). The mnemonic determines the size of the operands. 


Do not confuse this MOVSD instruction with the same-mnemonic MOVSD (move scalar double- 
precision floating-point) instruction in the 128-bit media instruction set. Assemblers can distinguish 
the instructions by the number and type of operands. 


The MOVSx instructions support the REP prefixes. For details about the REP prefixes, see “Repeat 
Prefixes” on page 12. 


Mnemonic Opcode Description 


Move byte at DS:rSI to ES:rDI, and then increment or 


MOVS mem8s, mem8 A4 decrement rSI and rDI. 

oveqcnie cats re Mevenor ae ES:rDl, and then increment or 
MOVS mem32, mem32 AS neemenverdeciementsiandioh 
Mave maneh meres AS le eke lacrsaviel oe ey to ES:rDI, and then increment 
“aVee AA Piva ie meget ES:rDl, and then increment or 
oust AS Move word at DS:rSI to ES:rDI, and then increment or 


decrement rSI and rDI. 
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Mnemonic Opcode Description 
Move doubleword at DS:rSI to ES:rDI, and then 
uaeyer pe increment or decrement rSI and rDI. 
Move quadword at DS:rSI to ES:rDI, and then increment 
MOVES ot or decrement rSI and rDIl. 
Related Instructions 
MOV, LODSx, STOSx 
rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS Z ‘ . non-canonical. 
Amemory address exceeded a data segment limit or was non- 
X X X 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
x A null data segment was used to reference memory. 

Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx x An unaligned memory reference was performed while 


#AC 


alignment checking was enabled. 
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Move with Sign-Extension 


Copies the value in a register or memory location (second operand) into a register (first operand), 
extending the most significant bit of an 8-bit or 16-bit value into all higher bits in a 16-bit, 32-bit, or 


64-bit register. 


Mnemonic Opcode Description 
Move the contents of an 8-bit register or memory 
MOVE ied SFC GANeIne OF BE location to a 16-bit register with sign extension. 
Move the contents of an 8-bit register or memory 
MOVSX feg22, regimema UE BEd location to a 32-bit register with sign extension. 
Move the contents of an 8-bit register or memory 
MOVSX reg64, reg/mem8 OF BE /r location to a 64-bit register with sign extension. 
Move the contents of an 16-bit register or memory 
MON SX. fegee fegineintG DEBE dE location to a 32-bit register with sign extension. 
Move the contents of an 16-bit register or memory 
MOVS* fegbe fegimemts OF BE location to a 64-bit register with sign extension. 
Related Instructions 
MOVSXD, MOVZX 
rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS ‘. ‘ ‘ non-canonical. 
é ee Xx Xx Xx A memory address exceeded a data segment limit or was non- 
eneral protection, canonical. 
#GP 
Xx A null data segment was used to reference memory. 
Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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MOVSXD Move with Sign-Extend Doubleword 


Copies the 32-bit value in a register or memory location (second operand) into a 64-bit register (first 
operand), extending the most significant bit of the 32-bit value into all higher bits of the 64-bit register. 


This instruction requires the REX prefix 64-bit operand size bit (REX.W) to be set to 1 to sign-extend 
a 32-bit source operand to a 64-bit result. Without the REX operand-size prefix, the operand size will 
be 32 bits, the default for 64-bit mode, and the source is zero-extended into a 64-bit register. With a 16- 
bit operand size, only 16 bits are copied, without modifying the upper 48 bits in the destination. 


This instruction is available only in 64-bit mode. In legacy or compatibility mode this opcode is 
interpreted as ARPL. 


Mnemonic Opcode Description 


Move the contents of a 32-bit register or memory 


MOVS sD 16954, TeQinema2 63 /r operand to a 64-bit register with sign extension. 


Related Instructions 


MOVSX, MOVZX 


rFLAGS Affected 
None 
Exceptions 
Virtual | Protecte 
Exception Real| 8086 d Cause of Exception 
Stack, #SS Xx A memory address was non-canonical. 
ae protection, Xx A memory address was non-canonical. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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MOVZX Move with Zero-Extension 


Copies the value in a register or memory location (second operand) into a register (first operand), zero- 
extending the value to fit in the destination register. The operand-size attribute determines the size of 
the zero-extended value. 


Mnemonic Opcode Description 


Move the contents of an 8-bit register or memory 


MOV ZX Teg 16, regimem’ OE Bo operand to a 16-bit register with zero-extension. 


OF BG /r Move the contents of an 8-bit register or memory 


MONZX Tega? reg nena operand to a 32-bit register with zero-extension. 


OF BG /r Move the contents of an 8-bit register or memory 


MOVZX reg64, reg/mem8 operand to a 64-bit register with zero-extension. 


Move the contents of a 16-bit register or memory 


MOMZX Tegd2, Tegimenm ta OE BL i operand to a 32-bit register with zero-extension. 


Move the contents of a 16-bit register or memory 


MOV ZX regs4, feQimemre OE BI i operand to a 64-bit register with zero-extension. 


Related Instructions 


MOVSXD, MOVSX 


rFLAGS Affected 
None 

Exceptions 

Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS - ‘. ‘ non-canonical. 
Xx Xx Xx A memory address exceeded a data segment limit or was non- 
General protection, canonical. 
#GP 
Xx A null data segment was used to reference memory. 

Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 

#AC alignment checking was enabled. 
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MUL Unsigned Multiply 


Multiplies the unsigned byte, word, doubleword, or quadword value in the specified register or 
memory location by the value in AL, AX, EAX, or RAX and stores the result in AX, DX:AX, 
EDX:EAX, or RDX:RAX (depending on the operand size). It puts the high-order bits of the product in 
AH, DX, EDX, or RDX. 


If the upper half of the product is non-zero, the instruction sets the carry flag (CF) and overflow flag 
(OF) both to 1. Otherwise, it clears CF and OF to 0. The other arithmetic flags (SF, ZF, AF, PF) are 
undefined. 


Mnemonic Opcode Description 
Multiplies an 8-bit register or memory operand by the 
MUL reg/memé8 F6 /4 contents of the AL register and stores the result in the 
AX register. 


Multiplies a 16-bit register or memory operand by the 
MUL reg/mem16 F7 /4 contents of the AX register and stores the result in the 

DX:AX register. 

Multiplies a 32-bit register or memory operand by the 
MUL reg/mem32 F7 /4 contents of the EAX register and stores the result in the 

EDX:EAX register. 

Multiplies a 64-bit register or memory operand by the 


MUL reg/mem64 F7 /4 contents of the RAX register and stores the result in the 
RDX:RAX register. 


Related Instructions 
DIV 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M U U U U M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 
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Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Amemory address exceeded the stack segment limit or was 
Stack, #SS ‘. ‘ a non-canonical. 
Xx Xx Xx Amemory address exceeded a data segment limit or was non- 
General protection, canonical. 
#GP 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, x Xx An unaligned memory reference is performed while alignment 


#AC 


checking was enabled. 
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MULX Multiply Unsigned 


Computes the unsigned product of the specified source operand and the implicit source operand rDX. 
Writes the upper half of the product to the first destination and the lower half to the second. Does not 
affect the arithmetic flags. 
This instruction has three operands: 

MULX dest, dest2, src 
In 64-bit mode, the operand size is determined by the value of VEX.W. If VEX.W is 1, the operand 


size is 64 bits; if VEX.W is 0, the operand size is 32 bits. In 32-bit mode, VEX.W is ignored. 16-bit 
operands are not supported. 


The first and second operands (dest1 and dest2) are general purpose registers. The specified source 
operand (src) is either a general purpose register or a memory operand. If the first and second operands 
specify the same register, the register receives the upper half of the product. 


This instruction is a BMI2 instruction. Support for this instruction is indicated by CPUID 
Fn0000_0007_EBX_x0[BMI2] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


VEX RXB.map_select W.vvvv.L.pp Opcode 
MULX reg32, reg32, reg/mem32 C4 RXB.02 0.dest2.0.11 F6 /r 
MULX reg64, reg64, reg/mem64 C4 RXB.02 1.dest2.0.11 F6 /r 


Related Instructions 


rFLAGS Affected 

None. 

Exceptions 

: Virtual : 
Exception Reall 8086 Protected Cause of Exception 
x Xx BMIZ2 instructions are only recognized in protected mode. 
BMIZ2 instructions are not supported, as indicated by 
Invalid opcode, #UD X | CPUID Fn0000_0007_EBX_x0[BMI2] = 0. 
Xx VEX.L is 1. 
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‘ Virtual ; 
Exception Real! 8086 [Protected Cause of Exception 
A memory address exceeded the stack segment limit or 

Stack, #55 s was non-canonical. 

Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 

Xx A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
Alignment check, #AC Xx An unaligned memory reference was performed while 


alignment checking was enabled. 
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MWAITX Monitor Wait with Timeout 


Used in conjunction with the MONITORX instruction to cause a processor to wait until a store occurs 
to a specific linear address range from another processor or the timer expires. The previously executed 
MONITOR instruction causes the processor to enter the monitor event pending state. The MWAITX 
instruction may enter an implementation dependent power state until the monitor event pending state 
is exited. The MWAITX instruction has the same effect on architectural state as the NOP instruction. 


Events that cause an exit from the monitor event pending state include: 


e A store from another processor matches the address range established by the MONITORX 
instruction. 


e The timer expires. 

e Any unmasked interrupt, including INTR, NMI, SMI, INIT. 

e RESET. 

e Any far control transfer that occurs between the MONITORX and the MWAITX. 

EAX specifies optional hints for the MWAITX instruction. Optimized C-state request is 
communicated through EAX[7:4]. The processor C-state is EAX[7:4]+1, so to request CO is to place 
the value F in EAX[7:4] and to request C1 is to place the value 0 in EAX[7:4]. All other components of 


EAX should be zero when making the C1 request. Setting a reserved bit in EAX is ignored by the 
processor. This is implicitly a 32-bit operand. 


ECX specifies optional extensions for the MWAITX instruction. The extensions currently defined for 
ECX are: 


e Bit 0: When set, allows interrupts to wake MWAITX, even when eFLAGS.IF = 0. Support for this 
extension is indicated by a feature flag returned by the CPUID instruction. 


e Bit 1: When set, EBX contains the maximum wait time expressed in Software PO clocks, the same 
clocks counted by the TSC. Setting bit 1 but passing in a value of zero on EBX is equivalent to 
setting bit 1 to a zero. The timer will not be an exit condition. 


e Bit 31-2: When non-zero, results in a #GP(0) exception. 

This is implicitly a 32-bit operand. 

CPUID Function 0000_0005h indicates support for extended features of MONITORX/MWAITX as 
well as MONITOR/MWAIT: 


e CPUID  Fn0000_0005_ECX[EMX] = 1 indicates support for enumeration of 
MONITOR/MWAIT/MONITORX/MWAITX extensions. 


¢ CPUID Fn0000_0005_ECX[IBE] = 1 indicates that MWAIT/MWAITX can set ECX[0] to allow 
interrupts to cause an exit from the monitor event pending state even when eFLAGS.IF = 0. 


The MWAITX instruction can be executed at any privilege level and MSR 
C001_0015h[MonMwaitUserEn] has no effect on MWAITX. 
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Support for the MWAITX instruction is indicated by CPUID Fn8000_0001_ECX[MONITORX] (bit 
29)= 1. 


Software must check the CPUID bit once per program or library initialization before using the 
MWAITX instruction, or inconsistent behavior may result. 


The use of the MWAITX instruction is contingent upon the satisfaction of the following coding 
requirements: 


¢ MONITORX must precede the MWAITX and occur in the same loop. 


¢ MWAITX must be conditionally executed only if the awaited store has not already occurred. (This 
prevents a race condition between the MONITORX instruction arming the monitoring hardware 
and the store intended to trigger the monitoring hardware.) 


There is no indication after exiting MWAITX of why the processor exited or if the timer expired. It is 
up to software to check whether the awaiting store has occurred, and if not, determining how much 
time has elapsed if it wants to re-establish the MONITORX with a new timer value. 


Mnemonic Opcode Description 


Causes the processor to stop 
instruction execution and enter 

MWAITX OF 01 FB an implementation-dependent 
optimized state until occurrence 
of a class of events 


Related Instructions 


MONITORX, MONITOR, MWAIT 


rFLAGS Affected 

None 

Exceptions 

Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Invalid opcode, Xx Xx Xx MONITORX/MWAITX instructions are not supported, as 

#UD indicated by CPUID Fn8000_0001_ECX[MONITORX] =0 
owe protection, Xx x x Unsupported extension bits in ECX 
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NEG Two’s Complement Negation 


Performs the two’s complement negation of the value in the specified register or memory location by 
subtracting the value from 0. Use this instruction only on signed integer numbers. 


If the value is 0, the instruction clears the CF flag to 0; otherwise, it sets CF to 1. The OF, SF, ZF, AF, 
and PF flag settings depend on the result of the operation. 


The forms of the NEG instruction that write to memory support the LOCK prefix. For details about the 
LOCK prefix, see “Lock Prefix” on page 11. 


Mnemonic Opcode Description 


Performs a two’s complement negation on an 8-bit 


NEG regimen’ F6/3 register or memory operand. 


Performs a two’s complement negation on a 16-bit 


NES fog meniG ers register or memory operand. 

Performs a two’s complement negation on a 32-bit 
Ee ae register or memory operand. 
NEG reg/mem64 E7 /3 Performs a two’s complement negation on a 64-bit 


register or memory operand. 


Related Instructions 


AND, NOT, OR, XOR 


rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 


M M M M M M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 
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Exceptions 
Virtual 
Exception Real| 8086 | Protected Cause of Exception 
Stack, #SS Xx Xx Xx A memory address exceeded the stack segment limit or was 


non-canonical. 


Amemory address exceeded a data segment limit or was non- 


x a “s canonical. 
General protection, 
#GP Xx The destination operand is in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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No Operation 


Does nothing. This instruction increments the rIP to point to next instruction, but does not affect the 


machine state in any other way. 


The single-byte variant is an alias for XCHG rAX, rAX. 


Mnemonic Opcode 
NOP 90 
NOP reg/mem16 OF 1F /O 
NOP reg/mem32 OF 1F /O 
NOP reg/mem64 OF 1F /0 


Related Instructions 
None 

rFLAGS Affected 
None 

Exceptions 


None 


Description 
Performs no operation. 


Performs no operation on a 16-bit register or memory 
operand. 


Performs no operation on a 32-bit register or memory 
operand. 


Performs no operation on a 64-bit register or memory 
operand. 
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NOT 
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One’s Complement Negation 


Performs the one’s complement negation of the value in the specified register or memory location by 
inverting each bit of the value. 


The memory-operand forms of the NOT instruction support the LOCK prefix. For details about the 
LOCK prefix, see “Lock Prefix” on page 11. 


Mnemonic Opcode Description 
Complements the bits in an 8-bit register or memory 
NOT reg/memés F6 /2 operand. 
Complements the bits in a 16-bit register or memory 
NOT reg/mem16 F7 /2 operand. 
Complements the bits in a 32-bit register or memory 
NOT reg/mem32 F7 /2 operand. 
Compliments the bits in a 64-bit register or memory 
NOT reg/mem64 F7 /2 operand. 
Related Instructions 
AND, NEG, OR, XOR 
rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #55 . : : non-canonical. 
Amemory address exceeded a data segment limit or was non- 
Xx X X ; 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference is performed while alignment 
#AC checking was enabled. 
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OR Logical OR 


Performs a logical or on the bits in a register, memory location, or immediate value (second operand) 
and a register or memory location (first operand) and stores the result in the first operand location. The 
two operands cannot both be memory locations. 


If both corresponding bits are 0, the corresponding bit of the result is 0; otherwise, the corresponding 
result bit is 1. 


The forms of the OR instruction that write to memory support the LOCK prefix. For details about the 
LOCK prefix, see “Lock Prefix” on page 11. 


Mnemonic Opcode Description 

OR AL, imm8& OC ib Or the contents of AL with an immediate 8-bit value. 

OR AX, imm16 OD iw Or the contents of AX with an immediate 16-bit value. 

OR EAX, imm32 OD id Or the contents of EAX with an immediate 32-bit value. 

OR RAX, imm32 OD id Or the contents of RAX with a sign-extended immediate 
32-bit value. 

OR reg/mems8, imm8 80 /1 ib Or the contents of an 8-bit register or memory operand 


and an immediate 8-bit value. 


OR reg/mem16, imm16 81/1 iw Or the contents of a 16-bit register or memory operand 
and an immediate 16-bit value. 


81 /1 id Or the contents of a 32-bit register or memory operand 


DIO EMS, HM2 and an immediate 32-bit value. 


OR reg/mem64, imm32 81/1 id Or the contents of a 64-bit register or memory operand 
and sign-extended immediate 32-bit value. 


83 /1 ib Or the contents of a 16-bit register or memory operand 


OREGHNEM ES ONS and a sign-extended immediate 8-bit value. 


OR reg/mem32, imm8 83 /1 ib Or the contents of a 32-bit register or memory operand 
and a sign-extended immediate 8-bit value. 


OR reg/mem64, imm8 83 /1 ib Or the contents of a 64-bit register or memory operand 
and a sign-extended immediate 8-bit value. 


08 /r Or the contents of an 8-bit register or memory operand 


ORS HeQHmems) Teds with the contents of an 8-bit register. 


Or the contents of a 16-bit register or memory operand 
PR EGINET TR IEG LO Oe ie with the contents of a 16-bit register. 


Or the contents of a 32-bit register or memory operand 
OR FeQ/Memns2, 1eG32 peut with the contents of a 32-bit register. 


Or the contents of a 64-bit register or memory operand 
OR EOHREM OS 100s Dele with the contents of a 64-bit register. 
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Mnemonic 


OR reg8, reg/mem8& 


OR reg16, reg/mem16 


OR reg32, reg/mem32 


OR reg64, reg/mem64 


Opcode 


OA /r 


OB /r 


OB /r 


OB /r 


AMD64 Technology 


Description 


Or the contents of an 8-bit register with the contents of 
an 8-bit register or memory operand. 


Or the contents of a 16-bit register with the contents of 
a 16-bit register or memory operand. 


Or the contents of a 32-bit register with the contents of 
a 32-bit register or memory operand. 


Or the contents of a 64-bit register with the contents of 
a 64-bit register or memory operand. 


The following chart summarizes the effect of this instruction: 


xX Y Xory 
0 0 0 
0 1 1 
1 0 1 
1 1 1 


Related Instructions 


AND, NEG, NOT, XOR 


rFLAGS Affected 

ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0) Mi|M|U}|M | O 

21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 | 9 8 | 7 6 | 4 | 2] 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 | Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS ‘ n 7 non-canonical. 
Amemory address exceeded a data segment limit or was non- 
X X Xx ; 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
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Virtual 
Exception Real| 8086 | Protected Cause of Exception 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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OUT 


Copies the value from the AL, AX, or EAX register (second operand) to an I/O port (first operand). 
The port address can be a byte-immediate value (00h to FFh) or the value in the DX register (0000h to 
FFFFh). The source register used determines the size of the port (8, 16, or 32 bits). 


AMD64 Technology 


Output to Port 


If the operand size is 64 bits, OUT only writes to a 32-bit I/O port. 


If the CPL is higher than the IOPL or the mode is virtual mode, OUT checks the I/O permission bitmap 
in the TSS before allowing access to the I/O port. See Volume 2 for details on the TSS I/O permission 


bitmap. 
Mnemonic Opcode Description 
: F Output the byte in the AL register to the port specified by 
CUT inne be E6 ib an 8-bit immediate value. 
. : Output the word in the AX register to the port specified 
Tne ELD by an 8-bit immediate value. 
‘ . Output the doubleword in the EAX register to the port 
CUT ARDS ED EL ip specified by an 8-bit immediate value. 
OUT DX, AL EE Output byte in AL to the output port specified in DX. 
OUT DX, AX EF Output word in AX to the output port specified in DX. 
OUT DX. EAX EF Output doubleword in EAX to the output port specified in 
Related Instructions 
IN, INSx, OUTSx 
rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Xx One or more I/O permission bits were set in the TSS for the 
General protection, accessed port. 
#GP Xx The CPL was greater than the IOPL and one or more I/O 
permission bits were set in the TSS for the accessed port. 
Page fault (HPF) Xx Xx A page fault resulted from the execution of the instruction. 
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OUTS Output String 
OUTSB 
OUTSW 
OUTSD 


Copies data from the memory location pointed to by DS:rSI to the I/O port address (0000h to FFFFh) 
specified in the DX register, and then increments or decrements the rSI register according to the setting 
of the DF flag in the rFLAGS register. 


If the DF flag is 0, the instruction increments rSI; otherwise, it decrements rSI. It increments or 
decrements the pointer by 1, 2, or 4, depending on the size of the value being copied. 


The OUTS DX mnemonic uses an explicit memory operand (second operand) to determine the type 
(size) of the value being copied, but always uses DS:rSI for the location of the value to copy. The 
explicit register operand (first operand) specifies the I/O port address and must always be DX. 


The no-operands forms of the mnemonic use the DS:rSI register pair to point to the memory data to be 
copied and the contents of the DX register as the destination I/O port address. The mnemonic specifies 
the size of the I/O port and the type (size) of the value being copied. 


The OUTSx instruction supports the REP prefix. For details about the REP prefix, see “Repeat 
Prefixes” on page 12. 


If the effective operand size is 64-bits, the instruction behaves as if the operand size were 32 bits. 


If the CPL is higher than the IOPL or the mode is virtual mode, OUTSx checks the I/O permission 
bitmap in the TSS before allowing access to the I/O port. See Volume 2 for details on the TSS I/O 
permission bitmap. 


Mnemonic Opcode Description 


Output the byte in DS:rSI to the port specified in DX, 


OUTS BX mens 6E then increment or decrement rSl. 

OUTS DX, mema6 oF Se al 
OUTS DX, mema2 oF SS remeron Hada 
ourse cE ee ne 
oursw oF ee ee 
OUTSD 6E Output the doubleword in DS:rSI to the port specified in 


DX, then increment or decrement rSl. 
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Related Instructions 
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IN, INSx, OUT 
rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 | Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS ‘. ‘ 7 non-canonical. 
Xx Xx Xx Amemory address exceeded a data segment limit or was non- 
canonical. 
Xx A null data segment was used to reference memory. 
General protection, — 
#GP x One or more I/O permission bits were set in the TSS for the 
accessed port. 
Xx The CPL was greater than the IOPL and one or more I/O 
permission bits were set in the TSS for the accessed port. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference is performed while alignment 
#AC checking was enabled. 
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PAUSE Pause 


Improves the performance of spin loops, by providing a hint to the processor that the current code is in 
a spin loop. The processor may use this to optimize power consumption while in the spin loop. 


Architecturally, this instruction behaves like a NOP instruction. 


Processors that do not support PAUSE treat this opcode as a NOP instruction. 


Mnemonic Opcode Description 


Provides a hint to processor that a spin loop is being 


PAUSE F3 90 ae sari 


Related Instructions 
None 

rFLAGS Affected 
None 

Exceptions 


None 
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PDEP Parallel Deposit Bits 


Scatters consecutive bits of the first source operand, starting at the least significant bit, to bit positions 
in the destination as specified by 1 bits in the second source operand (mask). Bit positions in the 
destination corresponding to 0 bits in the mask are cleared. 


This instruction has three operands: 
PDEP dest, src, mask 


The following diagram illustrates the operation of this instruction. 


PAR p10 b9 | bs |b7|b6) bs |b4|b3/b2]b1|bo] sre 


7 
f 


[0 |b6|bs|b4/ 0 | 0 |b3|p2/ 0 |bt} 0 | 0/0 |bo}o | 0} dest 
PT TETT TET TT Tt 
oft fs fsfofotsisfofsfofolo}sjojo| 


v3_PDEP _instruct.eps 


mask 


If the mask is all ones, the execution of this instruction effectively copies the source to the destination. 


In 64-bit mode, the operand size is determined by the value of VEX.W. If VEX.W is 1, the operand 
size is 64 bits; if VEX.W is 0, the operand size is 32 bits. In 32-bit mode, VEX.W is ignored. 16-bit 
operands are not supported. 


The destination (dest) and the source (src) are general-purpose registers. The second source operand 
(mask) is either a general-purpose register or a memory operand. 


This instruction is a BMI2 instruction. Support for this instruction is indicated by CPUID 
Fn0000_0007_EBX_x0[BMI2] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


VEX RXB.map_select W.vvvv.L.pp Opcode 
PDEP reg32, reg32, reg/mem32 C4 RXB.02 O.src.0.11 F5 Ir 
PDEP reg64, reg64, reg/mem64 C4 RXB.02 1.src.0.11 F5 Ir 
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Related Instructions 


rFLAGS Affected 


None. 


Exceptions 


Exception 


Mode 
Virtual 
Real) 8086 |Protected 
xX xX 


24594—Rev. 3.33—November 2021 


Cause of Exception 


BMIZ2 instructions are only recognized in protected mode. 


BMIZ2 instructions are not supported, as indicated by 


Invalid opcode, #UD X | CPUID Fn0000_0007_EBX_x0[BMI2] = 0. 

X VEX.L is 1. 

A memory address exceeded the stack segment limit or 

Stack, #55 * was non-canonical. 

Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 

Xx A null data segment was used to reference memory. 
Page fault, #PF x A page fault resulted from the execution of the instruction. 
Alignment check, #AC Xx An unaligned memory reference was performed while 


alignment checking was enabled. 
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PEXT Parallel Extract Bits 


Copies bits from the source operand, based on a mask, and packs them into the low-order bits of the 
destination. Clears all bits in the destination to the left of the most-significant bit copied. 


This instruction has three operands: 
PEXT dest, src, mask 


The following diagram illustrates the operation of this instruction. 
[a] +++ Lola [+ ]sfofol+]sfofsfofofo}sfofo| 
: pyy yy oy 4 
. «+ Bisbtdprapraprapioles]o6]7]os [os]oe]s3]e2[ei]e0] src 


Gee 


n-1 


mask 


dest 


}9 0} 0} 0/0} 0} 0} 0/0 bt4bi3p12/b9) bs | bs |b2 
15 14 13 12 11 #10 9 8 7 6 5 4 3 2 1 =«0 


v3_PEXT_instruct.eps 


If the mask is all ones, the execution of this instruction effectively copies the source to the destination. 


In 64-bit mode, the operand size is determined by the value of VEX.W. If VEX.W is 1, the operand 
size is 64 bits; if VEX.W is 0, the operand size is 32 bits. In 32-bit mode, VEX.W is ignored. 16-bit 
operands are not supported. 


The destination (dest) and the source (src) are general-purpose registers. The second source operand 
(mask) is either a general-purpose register or a memory operand. 


This instruction is a BMI2 instruction. Support for this instruction is indicated by CPUID 
Fn0000_0007_EBX_x0[BMI2] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


VEX RXB.map_select W.vvvv.L.pp Opcode 
PEXT reg32, reg32, reg/mem32 C4 RXB.02 0.src.0.10 F5 Ir 
PEXT reg64, reg64, reg/mem64 C4 RXB.02 1.src.0.10 F5 Ir 
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Related Instructions 


rFLAGS Affected 


None. 


Exceptions 


Exception 


Mode 
Virtual 
Real) 8086 |Protected 
xX xX 


24594—Rev. 3.33—November 2021 


Cause of Exception 


BMIZ2 instructions are only recognized in protected mode. 


BMIZ2 instructions are not supported, as indicated by 


Invalid opcode, #UD X | CPUID Fn0000_0007_EBX_x0[BMI2] = 0. 

X VEX.L is 1. 

A memory address exceeded the stack segment limit or 

Stack, #55 * was non-canonical. 

Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 

Xx A null data segment was used to reference memory. 
Page fault, #PF x A page fault resulted from the execution of the instruction. 
Alignment check, #AC Xx An unaligned memory reference was performed while 


alignment checking was enabled. 
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POP Pop Stack 


Copies the value pointed to by the stack pointer (SS:rSP) to the specified register or memory location 
and then increments the rSP by 2 for a 16-bit pop, 4 for a 32-bit pop, or 8 for a 64-bit pop. 


The operand-size attribute determines the amount by which the stack pointer is incremented (2, 4 or 8 
bytes). The stack-size attribute determines whether SP, ESP, or RSP is incremented. 


For forms of the instruction that load a segment register (POP DS, POP ES, POP FS, POP GS, POP 
SS), the source operand must be a valid segment selector. When a segment selector is popped into a 
segment register, the processor also loads all associated descriptor information into the hidden part of 
the register and validates it. 


It is possible to pop a null segment selector value (0000—0003h) into the DS, ES, FS, or GS register. 
This action does not cause a general protection fault, but a subsequent reference to such a segment 
does cause a #GP exception. For more information about segment selectors, see "Segment Selectors 
and Registers" in Volume 2: System Programming. 


In 64-bit mode, the POP operand size defaults to 64 bits and there is no prefix available to encode a 32- 
bit operand size. Using POP DS, POP ES, or POP SS instruction in 64-bit mode generates an invalid- 
opcode exception. 


This instruction cannot pop a value into the CS register. The RET (Far) instruction performs this 
function. 


Mnemonic Opcode Description 


8E /O Pop the top of the stack into a 16-bit register or memory 


POP reg/mem16 location. 


Pop the top of the stack into a 32-bit register or memory 
POP reg/mem32 8F /0 location. 
(No prefix for encoding this in 64-bit mode.) 


POP reg/mem64 8F /0 Hh top of the stack into a 64-bit register or memory 
POP reg16 58 +rw Pop the top of the stack into a 16-bit register. 
Pop the top of the stack into a 32-bit register. 
BOR TEg32 pend (No prefix for encoding this in 64-bit mode.) 
POP reg64 58 +rq Pop the top of the stack into a 64-bit register. 
Pop the top of the stack into the DS register. 
POPES a‘ (Invalid in 64-bit mode.) 
Pop the top of the stack into the ES register. 
POPES D7 (Invalid in 64-bit mode.) 
POP SS 17 Pop the top of the stack into the SS register. 


(Invalid in 64-bit mode.) 
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Mnemonic Opcode Description 
POP FS OF A1 Pop the top of the stack into the FS register. 
POP GS OF AQ Pop the top of the stack into the GS register. 
Related Instructions 
PUSH 
rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
es X _ | POP DS, POP ES, or POP SS was executed in 64-bit mode. 
eae Xx The DS, ES, FS, or GS register was loaded with a non-null 
(selector) segment selector and the segment was marked not present. 
A memory address exceeded the stack segment limit or was 
Stack, #SS a . 4 non-canonical. 
Stack, #SS Xx The SS register was loaded with a non-null segment selector 
(selector) and the segment was marked not present. 
Amemory address exceeded a data segment limit or was non- 
X X X 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Xx A segment register was loaded and the segment descriptor 
exceeded the descriptor table limit. 
Xx A segment register was loaded and the segment selector’s TI 
bit was set, but the LDT selector was a null selector. 
Xx The SS register was loaded with a null segment selector in 
non-64-bit mode or while CPL = 3. 
General protection, Xx The SS register was loaded and the segment selector RPL 
#GP and the segment descriptor DPL were not equal to the CPL. 
eereeles) Xx The SS register was loaded and the segment pointed to was 
not a writable data segment. 
The DS, ES, FS, or GS register was loaded and the segment 
Xx pointed to was a data or non-conforming code segment, but 
the RPL or the CPL was greater than the DPL. 
Xx The DS, ES, FS, or GS register was loaded and the segment 
pointed to was not a data segment or readable code segment. 
Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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POPA POP All GPRs 
POPAD 


Pops words or doublewords from the stack into the general-purpose registers in the following order: 
eDI, eSI, eBP, eSP (image is popped and discarded), eBX, eDX, eCX, and eAX. The instruction 
increments the stack pointer by 16 or 32, depending on the operand size. 


Using the POPA or POPAD instructions in 64-bit mode generates an invalid-opcode exception. 


Mnemonic Opcode Description 


Pop the DI, SI, BP, SP, BX, DX, CX, and AX registers. 
(Invalid in 64-bit mode.) 


Pop the EDI, ESI, EBP, ESP, EBX, EDX, ECX, and EAX 
POPAD 61 registers. 
(Invalid in 64-bit mode.) 


POPA 61 


Related Instructions 


PUSHA, PUSHAD 


rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Invalid opcode x This instruction was executed in 64-bit mode. 
(#UD) 
Stack, #SS Xx Xx Xx A memory address exceeded the stack segment limit. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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POPCNT Bit Population Count 


Counts the number of bits having a value of 1 in the source operand and places the result in the 
destination register. The source operand is a 16-, 32-, or 64-bit general purpose register or memory 
operand; the destination operand is a general purpose register of the same size as the source operand 
register. 


If the input operand is zero, the ZF flag is set to 1 and zero is written to the destination register. 
Otherwise, the ZF flag is cleared. The other flags are cleared. 


Support for the POPCNT instruction is indicated by CPUID Fn0000_0001_ECX[POPCNT] = 1. 
Software MUST check the CPUID bit once per program or library initialization before using the 
POPCNT instruction, or inconsistent behavior may result. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Opcode Description 

POPCNT  regi16, reg/mem16 F3 OF B8 /r Count the 1s in reg/mem16. 
POPCNT _ reg32, reg/mem32 F3 OF B8 /r Count the 1s in reg/mem32. 
POPCNT _ reg64, reg/nem64 F3 OF B8 /r Count the 1s in reg/mem64. 


Related Instructions 


BSF, BSR, LZCNT 


rFLAGS Affected 
wip [vig |ac |v | RF | NT | topL | OF | DF | iF | TF | SF | ZF | AF | PF | 
0) 0 ;]M/]O 0) 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


ID CF 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 
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Exceptions 
Virtual 
Exception Real} 8086 | Protected Cause of Exception 
Invalid opcode, Xx Xx Xx The POPCNT instruction is not supported, as indicated by 
#UD CPUID Fn0000_0001_ECX[POPCNT]. 
A memory address exceeded the stack segment limit or was 
Stack, #8 a : ‘ non-canonical. 
Xx Xx Xx Amemory address exceeded a data segment limit or was non- 

General protection, canonical. 

#GP 

Xx A null data segment was used to reference memory. 

Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 


#AC 


alignment checking was enabled. 
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POPF POP to rFLAGS 
POPFD 
POPFQ 


Pops a word, doubleword, or quadword from the stack into the rFLAGS register and then increments 
the stack pointer by 2, 4, or 8, depending on the operand size. 


In protected or real mode, all the non-reserved flags in the rFLAGS register can be modified, except 
the VIP, VIF, and VM flags, which are unchanged. In protected mode, at a privilege level greater than 
0 the IOPL is also unchanged. The instruction alters the interrupt flag (IF) only when the CPL is less 
than or equal to the IOPL. 


In virtual-8086 mode, if IOPL field is less than 3, attempting to execute a POPFx or PUSHFx 
instruction while VME is not enabled, or the operand size is not 16-bit, generates a #GP exception. 


In 64-bit mode, this instruction defaults to a 64-bit operand size; there is no prefix available to encode 
a 32-bit operand size. 


Mnemonic Opcode Description 
POPF 9D Pop a word from the stack into the FLAGS register. 
Pop a double word from the stack into the EFLAGS 
BORED ae register. (No prefix for encoding this in 64-bit mode.) 
POPFQ 9D Pop a quadword from the stack to the RFLAGS register. 
Action 


// See “Pseudocode Definition” on page 57. 
POPF_START: 


IF (REAL_MODE) 
POPF_REAL 

ELSIF (PROTECTED_MODE) 
POPF_PROTECTED 

ELSE // (VIRTUAL_MODE) 
POPF_VIRTUAL 


POPF_REAL: 


POP.v temp_RFLAGS 

RFLAGS.v = temp_RFLAGS // NIF,VIP,VM unchanged 
// RF cleared 

EXIT 
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POPF_PROTECTED: 


POP.v temp_RFLAGS 
RFLAGS.v = temp_RFLAGS // NIF,VIP,VM unchanged 
// IOPL changed only if (CPL==0) 
// IF changed only if (CPL<=old_RFLAGS.IOPL) 
// RF cleared 
EXIT 


POPF_VIRTUAL: 


IF (RFLAGS.IOPL==3) 
‘ 
POP.v temp_RFLAGS 
RFLAGS.v = temp_RFLAGS // NIF,VIP,VM, IOPL unchanged 
// RF cleared 
EXIT 


ELSIF ((CR4.VME==1) && (OPERAND_SIZE==16) ) 


POP.w temp_RFLAGS 
IF (((temp_RFLAGS.IF==1) && (RFLAGS.VIP==1)) || (temp_RFLAGS.TF==1) ) 
EXCEPTION [#GP(0) ] 
// notify the virtual-mode-manager to 


deliver 
// the task’s pending interrupts 
RFLAGS.w = temp_RFLAGS // IF,IOPL unchanged 
// RFLAGS.VIF=temp_RFLAGS. IF 
// RF cleared 
EXIT 
} 
ELSE // ((RFLAGS.IOPL<3) && ((CR4.VME==0) || (OPERAND_SIZE!=16) ) ) 


EXCEPTION [#GP(0) ] 
Related Instructions 


PUSHF, PUSHFD, PUSHFQ 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M M 0 M M M M M M M M M M M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 
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Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS ‘ a a non-canonical. 
The I/O privilege level was less than 3 and one of the following 
conditions was true: 
* CR4.VME was 0. 
ee protection, xX * The effective operand size was 32-bit. 
¢ Both the original EFLAGS.VIP and the new EFLAGS.IF bits 
were set. 
¢« The new EFLAGS.TF bit was set. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, x Xx An unaligned memory reference was performed while 


#AC 


alignment checking was enabled. 
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PREFETCH Prefetch L1 Data-Cache Line 
PREFETCHW 


Loads the entire 64-byte aligned memory sequence containing the specified memory address into the 
L1 data cache. The position of the specified memory address within the 64-byte cache line is 
irrelevant. If a cache hit occurs, or if a memory fault is detected, no bus cycle is initiated and the 
instruction is treated as a NOP. 


The PREFETCHW instruction loads the prefetched line and sets the cache-line state to Modified, in 
anticipation of subsequent data writes to the line. The PREFETCH instruction, by contrast, typically 
sets the cache-line state to Exclusive (depending on the hardware implementation). 


The opcodes for the PREFETCH/PREFETCHW instructions include the ModRM byte; however, only 
the memory form of ModRM is valid. The register form of ModRM causes an invalid-opcode 
exception. Because there is no destination register, the three destination register field bits of the 
ModRM byte define the type of prefetch to be performed. The bit patterns 000b and 001b define the 
PREFETCH and PREFETCHW instructions, respectively. All other bit patterns are reserved for future 
use. 


The reserved PREFETCH types do not result in an invalid-opcode exception if executed. Instead, for 
forward compatibility with future processors that may implement additional forms of the PREFETCH 
instruction, all reserved PREFETCH types are implemented as synonyms of the basic PREFETCH 
type (the PREFETCH instruction with type 000b). 


The operation of these instructions is implementation-dependent. The processor implementation can 
ignore or change these instructions. The size of the cache line also depends on the implementation, 
with a minimum size of 32 bytes. For details on the use of this instruction, see the processor data sheets 
or other software-optimization documentation relating to particular hardware implementations. 


When paging is enabled and PREFETCHW performs a prefetch from a writable page, it may set the 
PTE Dirty bit to 1. 


Support for the PREFETCH and PREFETCHW instructions is indicated by CPUID 
Fn8000_0001_ECX[3DNowPrefetch] OR Fn8000_0001_EDX[LM] OR 
Fn8000_0001_EDX[3DNow] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Opcode Description 
PREFETCH mem& OF OD /O Prefetch processor cache line into L1 data cache. 
PREFETCHW mem8 OF OD /1 Prefetch processor cache line into L1 data cache and 


mark it modified. 
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Related Instructions 


PREFETCHlevel 

rFLAGS Affected 

None 

Exceptions 

Virtual 
Exception (vector) Real | 8086 | Protected Cause of Exception 
PREFETCH and PREFETCHW instructions are not 
supported, as indicated by CPUID 
X X X Fn8000_0001_ECX[3DNowPrefetch] AND 
invalid apecde, UD Fn8000_0001_EDX[LM] AND 
Fn8000_0001_EDX[3DNow] = 0. 
Xx Xx x The operand was a register. 
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PREFETCHlevel Prefetch Data to Cache Level level 


Loads a cache line from the specified memory address into the data-cache level specified by the 
locality reference bits 5:3 of the ModRM byte. Table 3-3 on page 285 lists the locality reference 
options for the instruction. 


This instruction loads a cache line even if the mem8 address is not aligned with the start of the line. If 
the cache line is already contained in a cache level that is lower than the specified locality reference, or 
if a memory fault is detected, a bus cycle is not initiated and the instruction is treated as a NOP. 


The operation of this instruction is implementation-dependent. The processor implementation can 
ignore or change this instruction. The size of the cache line also depends on the implementation, with a 
minimum size of 32 bytes. AMD processors alias PREFETCH1 and PREFETCH2 to PREFETCHO. 
For details on the use of this instruction, see the software-optimization documentation relating to 
particular hardware implementations. 


Mnemonic Opcode Description 
PREEETCHNTA memé OF 18 /O ee closer to the processor using the NTA 
PREEETCHTO mem8 OF 18 /1 else closer to the processor using the TO 
PREEETCHT1 mem8 OF 18 /2 ee closer to the processor using the T1 
PREEETCHT2 mem8 OF 18 /3 Wea closer to the processor using the T2 


Table 3-3. Locality References for the Prefetch Instructions 


Locality 
Reference Description 


Non-Temporal Access—Move the specified data into the processor with 
minimum cache pollution. This is intended for data that will be used only 
once, rather than repeatedly. The specific technique for minimizing cache 
NTA pollution is implementation-dependent and may include such techniques 
as allocating space in a software-invisible buffer, allocating a cache line in 
only a single way, etc. For details, see the software-optimization 
documentation for a particular hardware implementation. 


TO All Cache Levels—Move the specified data into all cache levels. 

T1 Level 2 and Higher—Move the specified data into all cache levels except 
Oth level (L1) cache. 

T2 Level 3 and Higher—Move the specified data into all cache levels except 


Oth level (L1) and 1st level (L2) caches. 


Related Instructions 


PREFETCH, PREFETCHW 
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rFLAGS Affected 


None 


Exceptions 


None 
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PUSH Push onto Stack 


Decrements the stack pointer and then copies the specified immediate value or the value in the 
specified register or memory location to the top of the stack (the memory location pointed to by 
SS:rSP). 


The operand-size attribute determines the number of bytes pushed to the stack. The stack-size attribute 
determines whether SP, ESP, or RSP is the stack pointer. The address-size attribute is used only to 
locate the memory operand when pushing a memory operand to the stack. 


If the instruction pushes the stack pointer (rSP), the resulting value on the stack is that of rSP before 
execution of the instruction. 


There is a PUSH CS instruction but no corresponding POP CS. The RET (Far) instruction pops a value 
from the top of stack into the CS register as part of its operation. 


In 64-bit mode, the operand size of all PUSH instructions defaults to 64 bits, and there is no prefix 
available to encode a 32-bit operand size. Using the PUSH CS, PUSH DS, PUSH ES, or PUSH SS 
instructions in 64-bit mode generates an invalid-opcode exception. 


Pushing an odd number of 16-bit operands when the stack address-size attribute is 32 results in a 
misaligned stack pointer. 


Mnemonic Opcode Description 
Push the contents of a 16-bit register or memory 
PUSH feginemts pe operand onto the stack. 
Push the contents of a 32-bit register or memory 
PUSH reg/mem32 FF /6 operand onto the stack. (No prefix for encoding this in 
64-bit mode.) 
Push the contents of a 64-bit register or memory 
PUSH fegimemGs EF 16 operand onto the stack. 
PUSH reg16 50 +rw Push the contents of a 16-bit register onto the stack. 
Push the contents of a 32-bit register onto the stack. (No 
PUSH regee 50 +rd prefix for encoding this in 64-bit mode.) 
PUSH reg64 50 +rq Push the contents of a 64-bit register onto the stack. 
. ; Push an 8-bit immediate value (sign-extended to 16, 32, 
BUSH Ung BAID or 64 bits) onto the stack. 
PUSH imm16 68 iw Push a 16-bit immediate value onto the stack. 
. . Push a 32-bit immediate value onto the stack. (No prefix 
Evening 6B id for encoding this in 64-bit mode.) 
PUSH imm64 68 id Ses) a sign-extended 32-bit immediate value onto the 


Push the CS selector onto the stack. (Invalid in 64-bit 


PUSH CS OE mode.) 
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Mnemonic Opcode Description 
PUSH SS 16 Push the SS selector onto the stack. (Invalid in 64-bit 
mode.) 
PUSH DS 1E Push the DS selector onto the stack. (Invalid in 64-bit 
mode.) 
PUSH ES 06 Push the ES selector onto the stack. (Invalid in 64-bit 
mode.) 
PUSH FS OF AO Push the FS selector onto the stack. 
PUSH GS OF A8& Push the GS selector onto the stack. 
Related Instructions 
POP 
rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 | Protected Cause of Exception 
Invalid opcode, x PUSH CS, PUSH DS, PUSH ES, or PUSH SS was executed 
#UD in 64-bit mode. 
A memory address exceeded the stack segment limit or was 
Stack, #SS ‘ n A non-canonical. 
Xx Xx Xx Amemory address exceeded a data segment limit or was non- 
General protection, canonical. 
#GP 
x A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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PUSHA Push All GPRs onto Stack 
PUSHAD 


Pushes the contents of the eAX, eCX, eDX, eBX, eSP (original value), eBP, eSI, and eDI general- 
purpose registers onto the stack in that order. This instruction decrements the stack pointer by 16 or 32 
depending on operand size. 


Using the PUSHA or PUSHAD instruction in 64-bit mode generates an invalid-opcode exception. 


Mnemonic Opcode Description 
Push the contents of the AX, CX, DX, BX, original SP, 
PUSHA 60 BP, SI, and DI registers onto the stack. 


(Invalid in 64-bit mode.) 


Push the contents of the EAX, ECX, EDX, EBX, original 
PUSHAD 60 ESP, EBP, ESI, and EDI registers onto the stack. 
(Invalid in 64-bit mode.) 


Related Instructions 


POPA, POPAD 
rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
ee opcode, x This instruction was executed in 64-bit mode. 
Stack, #SS x Xx Xx A memory address exceeded the stack segment limit. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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PUSHF Push rFLAGS onto Stack 
PUSHFD 
PUSHFQ 


Decrements the rSP register and copies the rFLAGS register (except for the VM and RF flags) onto the 
stack. The instruction clears the VM and RF flags in the rFLAGS image before putting it on the stack. 


The instruction pushes 2, 4, or 8 bytes, depending on the operand size. 


In 64-bit mode, this instruction defaults to a 64-bit operand size and there is no prefix available to 
encode a 32-bit operand size. 


In virtual-8086 mode, if system software has set the IOPL field to a value less than 3, a general- 
protection exception occurs if application software attempts to execute PUSHFx or POPFx while 
VME is not enabled or the operand size is not 16-bit. 


Mnemonic Opcode Description 
PUSHF 9C Push the FLAGS word onto the stack. 
Push the EFLAGS doubleword onto stack. (No prefix 
PUSHED 9c encoding this in 64-bit mode.) 
PUSHFQ 9C Push the RFLAGS quadword onto stack. 
Action 


// See “Pseudocode Definition” on page 57. 


PUSHF_START: 

IF (REAL_MODE) 
PUSHF_REAL 

ELSIF (PROTECTED_MODE) 
PUSHF_PROTECTED 

ELSE // (VIRTUAL_MODE) 
PUSHF_VIRTUAL 


PUSHF_REAL: 
PUSH.v old_RFLAGS // Pushed with RF and VM cleared. 
EXIT 


PUSHF_PROTECTED: 
PUSH.v old_RFLAGS // Pushed with RF cleared. 
EXIT 


PUSHF_VIRTUAL : 
IF (RFLAGS.IOPL==3) 


PUSH.v old_RFLAGS // Pushed with RF,VM cleared. 
EXIT 
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ELSIF ((CR4.VME==1) && (OPERAND_SIZE==16) ) 


{ 
PUSH.v old_RFLAGS // Pushed with VIF in the IF position. 
// Pushed with IOPL=3. 
EXIT 
} 
ELSE // ((RFLAGS.IOPL<3) && ((CR4.VME==0) || (OPERAND_SIZE!=16)) ) 


EXCEPTION [#GP(0)] 


Related Instructions 


POPF, POPFD, POPFQ 


rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS . - ‘ non-canonical. 
General protection, x The I/O privilege level was less than 3 and either VME was not 
#GP enabled or the operand size was not 16-bit. 
Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 


#AC 


alignment checking was enabled. 
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RCL Rotate Through Carry Left 


Rotates the bits of a register or memory location (first operand) to the left (more significant bit 
positions) and through the carry flag by the number of bit positions in an unsigned immediate value or 
the CL register (second operand). The bits rotated through the carry flag are rotated back in at the right 
end (Isb) of the first operand location. 


The processor masks the upper three bits of the count operand, thus restricting the count to a number 
between 0 and 31. When the destination is 64 bits wide, the processor masks the upper two bits of the 
count, providing a count in the range of 0 to 63. 


For 1-bit rotates, the instruction sets the OF flag to the logical xor of the CF bit (after the rotate) and 
the most significant bit of the result. When the rotate count is greater than 1, the OF flag is undefined. 
When the rotate count is 0, no flags are affected. 


Mnemonic Opcode Description 


Rotate the 9 bits consisting of the carry flag and an 8-bit 


RCL reg/mem8,1 DO /2 register or memory location left 1 bit. 


Rotate the 9 bits consisting of the carry flag and an 8-bit 
RCL reg/mem8s, CL D2 /2 register or memory location left the number of bits 
specified in the CL register. 


Rotate the 9 bits consisting of the carry flag and an 8-bit 
RCL reg/mems, imm8& CO /2 ib register or memory location left the number of bits 
specified by an 8-bit immediate value. 


Rotate the 17 bits consisting of the carry flag and a 16- 


ROE Leguneni Syd oe bit register or memory location left 1 bit. 


Rotate the 17 bits consisting of the carry flag and a 16- 
RCL reg/mem16, CL D3 /2 bit register or memory location left the number of bits 
specified in the CL register. 


Rotate the 17 bits consisting of the carry flag and a 16- 
RCL reg/mem16, imm8& C1 /2 ib bit register or memory location left the number of bits 
specified by an 8-bit immediate value. 


Rotate the 33 bits consisting of the carry flag and a 32- 


RCLregimems2, DL? bit register or memory location left 1 bit. 


Rotate 33 bits consisting of the carry flag and a 32-bit 
RCL reg/mem32, CL D3 /2 register or memory location left the number of bits 
specified in the CL register. 


Rotate the 33 bits consisting of the carry flag and a 32- 
RCL reg/mem32, imm& C1 /2 ib bit register or memory location left the number of bits 
specified by an 8-bit immediate value. 


Rotate the 65 bits consisting of the carry flag and a 64- 


BOLI GME a DL bit register or memory location left 1 bit. 
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Mnemonic 


RCL reg/mem64, CL 


RCL reg/mem64, imm8& 


Related Instructions 


RCR, ROL, ROR 


Opcode 


D3 /2 


C1 /2 ib 


Description 


AMD64 Technology 


Rotate the 65 bits consisting of the carry flag and a 64- 
bit register or memory location left the number of bits 


specified in the CL register. 


Rotates the 65 bits consisting of the carry flag and a 64- 
bit register or memory location left the number of bits 


specified by an 8-bit immediate value. 


rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M 
21 | 20 | 19 | 18 ; 17 | 16 | 14 13:12 11 |) 10 9 8 7 6 4 2 0 
Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS ‘ a . non-canonical. 
A memory address exceeded a data segment limit or was non- 
X Xx X : 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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RCR Rotate Through Carry Right 


Rotates the bits of a register or memory location (first operand) to the right (toward the less significant 
bit positions) and through the carry flag by the number of bit positions in an unsigned immediate value 
or the CL register (second operand). The bits rotated through the carry flag are rotated back in at the 
left end (msb) of the first operand location. 


The processor masks the upper three bits in the count operand, thus restricting the count to a number 
between 0 and 31. When the destination is 64 bits wide, the processor masks the upper two bits of the 
count, providing a count in the range of 0 to 63. 


For 1-bit rotates, the instruction sets the OF flag to the logical xor of the two most significant bits of 
the result. When the rotate count is greater than 1, the OF flag is undefined. When the rotate count is 0, 
no flags are affected. 


Mnemonic Opcode Description 


DO /3 Rotate the 9 bits consisting of the carry flag and an 8-bit 


RCR reg/memé, 1 register or memory location right 1 bit. 


Rotate the 9 bits consisting of the carry flag and an 8-bit 
RCR reg/mem8,CL D2 /3 register or memory location right the number of bits 
specified in the CL register. 


Rotate the 9 bits consisting of the carry flag and an 8-bit 
RCR reg/mem8,imm8s CO /3 ib register or memory location right the number of bits 
specified by an 8-bit immediate value. 


Rotate the 17 bits consisting of the carry flag and a 16- 


RCR reg/mem16,1 Dive bit register or memory location right 1 bit. 


Rotate the17 bits consisting of the carry flag and a 16-bit 
RCR reg/mem16,CL D3 /3 register or memory location right the number of bits 
specified in the CL register. 


Rotate the 17 bits consisting of the carry flag and a 16- 
RCR reg/mem16, imm8 C1 /3 ib bit register or memory location right the number of bits 
specified by an 8-bit immediate value. 


Rotate the 33 bits consisting of the carry flag and a 32- 


RCR regmens2,1 bD1/3 bit register or memory location right 1 bit. 


Rotate 33 bits consisting of the carry flag and a 32-bit 
RCR reg/mem32,CL D3 /3 register or memory location right the number of bits 
specified in the CL register. 


Rotate the 33 bits consisting of the carry flag and a 32- 
RCR reg/mem32, imm& C1 /3 ib bit register or memory location right the number of bits 
specified by an 8-bit immediate value. 


D1 /3 Rotate the 65 bits consisting of the carry flag and a 64- 


ROR regunem64 1 bit register or memory location right 1 bit. 
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Mnemonic Opcode Description 
Rotate 65 bits consisting of the carry flag and a 64-bit 
RCR reg/mem64,CL D3 /3 register or memory location right the number of bits 
specified in the CL register. 
Rotate the 65 bits consisting of the carry flag and a 64- 
RCR reg/mem64, imm8& C1 /3 ib bit register or memory location right the number of bits 
specified by an 8-bit immediate value. 
Related Instructions 
RCL, ROR, ROL 
rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF |) SF | ZF | AF | PF | CF 
M M 
21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real; 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS a - non-canonical. 
Amemory address exceeded a data segment limit or was non- 
X xX 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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RDFSBASE Read FS.base 
RDGSBASE Read GS.base 


Copies the base field of the FS or GS segment descriptor to the specified register. When supported and 
enabled, these instructions can be executed at any processor privilege level. The RDFSBASE and 
RDGSBASE instructions are only defined in 64-bit mode. 


System software must set the FSGSBASE bit (bit 16) of CR4 to enable the RDFSBASE and 
RDGSBASE instructions. 


Support for this instruction is indicated by CPUID Fn0000_0007_EBX_x0[FSGSBASE] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Opcode Description 


Copy the lower 32 bits of FS.base to the specified 


RDFSBASE reg32 FSOFAEIO: «| kerb purpose reaicter 


Copy the entire 64-bit contents of FS.base to the 
REESBASE IEgOs PoOF-AE 10 specified general-purpose register. 


Copy the lower 32 bits of GS.base to the specified 


RDGSBASE reg32 FS0PAEI: “joasrabourncse tenieter 


Copy the entire 64-bit contents of GS.base to the 


BEGSBASE fe00¢ Peoene specified general-purpose register. 


Related Instructions 


WRESBASE, WRGSBASE 


rFLAGS Affected 
None. 
Exceptions 
Compat- 
Exception Legacy) ibility | 64-bit Cause of Exception 
Xx Xx Instruction is not valid in compatibility or legacy 
modes. 
#UD Instruction not supported as indicated by CPUID 
X | Fn0000_0007_EBX_x0[FSGSBASE] = 0 or, if 
supported, not enabled in CR4. 
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RDPID Read Processor ID 


RDPID reads the value of TSC_AUX MSR used by the RDTSCP instruction into the specified 
destination register. Normal operand size prefixes do not apply and the update is either 32 bit or 64 bit 
based on the current mode. 


The RDPID instruction can be used to access the TSC_AUX value at CPL > 0 in cases where the 
operating system has disabled unprivileged execution of the RDTSCP instruction. 


The content of the TSC_AUX MSR, including how and even whether it actually indicates a processor 
ID, is a matter of operating system convention. 


The RDPID instruction is supported if the feature flag CPUID Fn0000_0007_X0_ECX[22]=1. 


Mnemonic Opcode Description 
RDPID F3 OF C7/7 Read TSC_AUX 
Related Instructions 


RDTSCP 
rFLAGS Affected 


rNone 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 


Virtual 
8086 [Protected Cause of Exception 


ae opcode, aeenee not supported by CPUID Fn0000_0007_ECX[22] = 
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RDPRU Read Processor Register 


RDPRU instruction is used to give access to some processor registers that are typically only accessible 
when the privilege level is zero. ECX is used as the implicit register to specify which register to read. 
RDPRU places the specified register’s value into EDX:EAX. 


The RDPRU instruction normally can be executed at any privilege level. When CR4.TSD=1, RDPRU 
can only be used when the privilege level is zero. When the CPL>0 with CR4.TSD=1, the RDPRU 
instruction will generate a #UD fault. 


The RDPRU instruction is supported if the feature flag CPUID Fn8000_0008 EBX[4]=1. The 16-bit 
field in CPUID Fn8000_0008-EDX[31:16] returns the largest ECX value that returns a valid register. 
Any unsupported ECX values return zero. Registers currently supported by ECX values are: 


¢ ECX Value 0 = Register MPERF 
¢ ECX Value 1 = Register APERF 


Mnemonic Opcode Description 

RDPRU OF 01 FD Copy register specified by ECX into EDX:EAX 

rFLAGS Affected 

ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF |) SF | ZF | AF | PF | CF 
0 0 0 0 0 M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 


Virtual 
8086 [Protected Cause of Exception 


Instruction not supported by 
Invalid opcode, CPUID Fn8000_0008. EBX{RDPRU] = 0 or CPL>0 and 


#UD CR4.TSD=1. 
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RDRAND Read Random 


Loads the destination register with a hardware-generated random value. 
The size of the returned value in bits is determined by the size of the destination register. 


Hardware modifies the CF flag to indicate whether the value returned in the destination register is 
valid. If CF = 1, the value is valid. If CF = 0, the value is invalid. Software must test the state of the CF 
flag prior to using the value returned in the destination register to determine if the value is valid. If the 
returned value is invalid, software must execute the instruction again. Software should implement a 
retry limit to ensure forward progress of code. 


The execution of RDRAND clears the OF, SF, ZF, AF, and PF flags. 


Support for the RDRAND instruction is optional. On processors that support the instruction, CPUID 
Fn0000_0001_ECX[RDRAND] = 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Opcode Description 
RDRAND reg16 OF C7 /6 Load the destination register with a 16-bit random 
number. 
RDRAND reg32 OF C7 /6 Load the destination register with a 32-bit random 
number. 
RDRAND reg64 OF C7 /6 ae destination register with a 64-bit random 


Related Instructions 
RDSEED 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 


0 0 0 0 0 M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
8086 [Protected Cause of Exception 
Invalid opcode, Instruction not supported as indicated oe 
#UD CPUID Fn0000_0001_ECX[RDRAND] = 


General-Purpose : 299 
Instruction Reference [AM D Pu blic Use] 


AMD¢\ 
AMD64 Technology 24594—Rev. 3.33—November 2021 


RDSEED Read Random Seed 


Loads the destination register with a hardware-generated random “seed” value. 
The size of the returned value in bits is determined by the size of the destination register. 


Hardware modifies the CF flag to indicate whether the value returned in the destination register is 
valid. If CF = 1, the value is valid. If CF = 0, the value is invalid and will be returned as zero. Software 
must test the state of the CF flag prior to using the value returned in the destination register to 
determine if the value is valid. If the returned value is invalid, software must execute the instruction 
again. Software should implement a retry limit to ensure forward progress of code. 


The execution of RDSEED clears the OF, SF, ZF, AF, and PF flags. 


Mnemonic Opcode Description 

RDSEED regi6 OF C7 /7 Read 16-bit random seed 
RDSEED reg32 OF C7 /7 Read 32-bit random seed 
RDSEED reg64 OF C7 /7 Read 64-bit random seed 


Related Instructions 


RDRAND 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0 0 0 0 0 M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to O is M (modified). Unaffected flags are 
blank. Undefined flags are U. 


Exceptions 


Virtual 
8086 Protected Cause of Exception 


Instruction not supported as eG by CPUID 


Invalid opcode, #UD |x Fn0000_0007_EBX_x0[RDSEED] = 
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RET (Near) Near Return from Called Procedure 


Returns from a procedure previously entered by a CALL near instruction. This form of the RET 
instruction returns to a calling procedure within the current code segment. 


This instruction pops the rIP from the stack, with the size of the pop determined by the operand size. 
The new rIP is then zero-extended to 64 bits. The RET instruction can accept an immediate value 
operand that it adds to the rSP after it pops the target rIP. This action skips over any parameters 
previously passed back to the subroutine that are no longer needed. 


In 64-bit mode, the operand size defaults to 64 bits (eight bytes) without the need for a REX prefix. No 
prefix is available to encode a 32-bit operand size in 64-bit mode. 


See RET (Far) for information on far returns—returns to procedures located outside of the current 
code segment. For details about control-flow instructions, see “Control Transfers” in Volume 1, and 
“Control-Transfer Privilege Checks” in Volume 2. 


Mnemonic Opcode Description 
RET C3 Near return to the calling procedure. 
RET imm16 C2 iw Near return to the calling procedure then pop the 


specified number of bytes from the stack. 


Action 
RETN_START: 


IF (OPCODE == retn imm16) 

temp_IMM = 16 bit immediate from the instruction, zero-extended to 64 bits 
ELSE // (OPCODE == retn) 

temp_IMM = 0 


IF (stack is not large enough for a v-sized pop) 
EXCEPTION[#SS(0) ] 


POP.v temp_RIP 


IF ((64BIT_MODE) && (temp_RIP is non-canonical) || 
(!64BIT_MODE) && (temp_RIP > CS.limit)) 
EXCEPTION [#GP(0) ] 


IF (ShadowStacksEnabled at current CPL) 
{ 
IF (v == 2) // operand size = 16 
t 
temp_sstk_RIP = SSTK_READ_MEM.d [SSP] 
SSP = SSP + 4 


General-Purpose : 301 
Instruction Reference [AM D Pu blic Use] 


AMDd¢1 


AMD64 Technology 24594—Rev. 3.33—November 2021 
ELSEIF (v == 4) // operand size = 32 
t 


temp_sstk_RIP = SSTK_READ_MEM.d [SSP] 


SSP = SSP + 4 


} 
ELSE // (v == 8) // operand size = 64 


{ 


temp_sstk_RIP = SSTK_READ_MEM.q [SSP] 


SSP = SSP + 8 


IF (temp_RIP != temp_sstk_RIP) 


EXCEPTION [#CP(RETN) ] 
} end shadow stacks enabled 


RSP.S RSP + temp_IMM 
RIP temp_RIP 
EXIT // end RETN 


Related Instructions 


CALL (Near), CALL (Far), RET (Far) 


rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 | Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS : ‘ . non-canonical. 
General protection, Xx Xx xX The target offset exceeded the code segment limit or was non- 
#GP canonical. 
Page fault, #PF x x A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
Control-protection, Xx The return address on the program stack did not match the 
#CP address on the shadow stack. 
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RET (Far) Far Return from Called Procedure 


Returns from a procedure previously entered by a CALL Far instruction. This form of the RET 
instruction returns to a calling procedure in a different segment than the current code segment. It can 
return to the same CPL or to a less privileged CPL. 


RET Far pops a target CS and rIP from the stack. If the new code segment is less privileged than the 
current code segment, the stack pointer is incremented by the number of bytes indicated by the 
immediate operand, if present; then a new SS and rSP are also popped from the stack. 


The final value of rSP is incremented by the number of bytes indicated by the immediate operand, if 
present. This action skips over the parameters (previously passed to the subroutine) that are no longer 
needed. 


All stack pops are determined by the operand size. If necessary, the target rIP is zero-extended to 64 
bits before assuming program control. 


If the CPL changes, the data segment selectors are set to NULL for any of the data segments (DS, ES, 
FS, GS) not accessible at the new CPL. 


See RET (Near) for information on near returns—returns to procedures located inside the current code 
segment. For details about control-flow instructions, see “Control Transfers” in Volume 1, and 
“Control-Transfer Privilege Checks” in Volume 2. 


Mnemonic Opcode Description 
RETF CB Far return to the calling procedure. 
RETE imm16 CA iw Far return to the calling procedure, then pop the 


specified number of bytes from the stack. 


Action 
// For functions READ_DESCRIPTOR, ShadowStacksEnabled 
// see "Pseudocode Definition" on page 57 


RETF_START: 


IF (PROTECTED_MODE) 
RETF_PROTECTED 

ELSE // (REAL_MODE or VIRTUAL_MODE) 
RETF_REAL_OR_VIRTUAL 


RETF_REAL_OR_VIRTUAL: 


IF (OPCODE == retf imm16) 

temp_IMM = 16 bit immediate operand, zero-extended to 64 bits 
ELSE // (OPCODE == retf) 

temp_IMM = 0 
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POP.v temp_RIP 
POP.v temp_CS 


IF (temp_RIP > CS.limit) 
EXCEPTION [#GP(0) ] 


CcS.sel = temp_CsS 
CS.base = temp_CS SHL 4 


RSP.S = RSP + temp_IMM 
RIP = temp_RIP 
EXIT // end RETF real or virtual modes 


RETF_PROTECTED: 


IF (OPCODE == retf imm16) 

temp_IMM = 16 bit immediate operand, zero-extended to 64 bits 
ELSE // (OPCODE == retf) 

temp_IMM = 0 


POP.v temp_RIP 
POP.v temp_CS 
temp_CPL = temp_CS.rpl 


IF (CPL == temp_CPL) // not changing privilege level 
RETF_PROTECTED_TO_SAME_PRIV 

ELSE 
RETF_PROTECTED_TO_OUTER_PRIV 


RETF_PROTECTED_TO_SAME_PRIV: 
// CPL = temp_CS.rpl (RETF to same privilege level) 
CS = READ_DESCRIPTOR (temp_CS, iret_chk) 


IF ((64BIT_MODE) && (temp_RIP is non-canonical) || 
(!64BIT_MODE) && (temp_RIP > CS.limit)) 
EXCEPTION [#GP(0) ] 


RIP temp_RIP 
RSP.S = RSP + temp_IMM 


IF (ShadowStacksEnabled(current CPL) ) 


IF (SSP[2:0] != 0) 


EXCEPTION [#CP(RETF/IRET ) ] // SSP must be 8-byte aligned 
temp_sstk_CS = SSTK_READ_MEM.q [SSP + 16] // read CS from sstk 
temp_sstk_LIP = SSTK_READ_MEM.q [SSP + 8] // read LIP 
temp_sstk_prevSSP = SSTK_READ_MEM.q [SSP] // read previous SSP 


SSP = SSP + 24 
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IF (temp_CS != temp_sstk_CS) 


EXCEPTION [#CP(RETF/IRET) ] // CS mismatch 
IF ((CS.base + RIP) != temp_sstk_LIP) 
EXCEPTION [#CP(RETF/IRET) ] // LIP mismatch 
IF (temp_sstk_prevSSP[1:0] != 0) 
EXCEPTION [#CP(RETF/IRET ) ] // prevSSP must be 4-byte aligned 
IF ((COMPATIBILITY_MODE) && (tmp_sstk_prevSSP[63:32] != 0)) 
EXCEPTION [#GP(0) ] // prevSSP must be <4GB in compat mode 


IF ((64BIT_MODE) && (temp_sstk_prevSSP is non-canonical) ) 
EXCEPTION [#GP(0)] 

SSP = temp_sstk_prevSSP 

} // end shadow stacks enabled at current CPL 


EXIT // end RETF to same privilege level 


RETF_PROTECTED_TO_OUTER_PRIV: 

// CPL != temp_CS.rpl (RETF changing privilege level) 
POP.v temp_RSP 
POP.v temp_SS 


CS = READ_DESCRIPTOR (temp_CS, iret_chk) 
temp_oldCPL = CPL 


IF ((64BIT_MODE) && (temp_RIP is non-canonical) || 
(!64BIT_MODE) && (temp_RIP > CS.limit)) 
EXCEPTION [#GP(0) ] 


CPL = temp_CPL 
SS = READ_DESCRIPTOR (temp_SS, ss_chk) 


RIP 
RSP.S 


temp_RIP 
temp_RSP + temp_IMM 


IF (ShadowStacksEnabled(old CPL)) 


IF (SSP[2:0] != 0) 

EXCEPTION [#CP(RETF/IRET)] // SSP must be 8-byte aligned 
temp_sstk_CS SSTK_READ_MEM.q [SSP + 16] // read CS from sstk 
temp_sstk_LIP SSTK_READ_MEM.q [SSP + 8] // read LIP 
temp_SSP SSTK_READ_MEM.q [SSP] // read previous SSP 
SSP = SSP +24 
IF (temp_CS != temp_sstk_CS) 

EXCEPTION [#CP(RETF/IRET) ] // CS mismatch 
IF ((CS.base + RIP) != temp_sstk_LIP) 

EXCEPTION [#CP(RETF/IRET) ] // LIP mismatch 
IF (temp_SSP[1:0] != 0) 

EXCEPTION [#CP(RETF/IRET ) ] // prevSSP must be 4-byte aligned 
IF ((COMPATIBILITY_MODE) && (tmp_sstk_prevSSP[63:32] != 0)) 

EXCEPTION [#GP(0) ] // prevSSP must be <4GB in compat mode 


Z 
temp_oldSSP = SSP 
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IF 


IF 


(ShadowStacksEnabled(new CPL) ) 
{ 
IF ((ShadowStacksEnabled(CPL 3) && (old_CPL == 3)) 
temp_SSP = PL3_SSP 
IF ((COMPATIBILITY_MODE) && (temp_SSP[63:32] != 0)) 
EXCEPTION [#GP(0) ] // SSP must be <4GB in compat mode 
SSP = temp_SSP 


} 


(ShadowStacksEnabled(old CPL) ) 

{ // check shadow stack token and clear busy 

bool invalid_token = FALSE 

< start atomic section > 

temp_Token= SSTK_READ_MEM.q [temp_oldSSP] // read supervisor sstk token 
IF ((temp_Token AND 0x01) != 1) 


invalid_Token = TRUE // token busy bit must be 1 
IF ((temp_Token AND ~0x01) != temp_oldSSP) 

invalid_Token = TRUE // address in token must = old SSP 
IF (!invalid_Token) 

temp_Token = temp_Token AND ~0x01 // if valid clear token busy bit 


SSTK_WRITE_MEM.q [temp_oldSSP] = temp_Token // writeback token 
< end atomic section > 
} // end shadow stacks enabled 


FOR (seg = ES, DS, FS, GS) 


IF ((seg.sel == NULL) || ((Sseg.attr.dpl < CPL) && 
((seg.attr.type == ‘data’) || 
(seg.attr.type == ’non-conforming-code’ )))) 


seg = NULL // can’t use lower DPL data segment at higher CPL 
// also clears RPL of any null selectors 


Related Instructions 


CALL (Near), CALL (Far), RET (Near) 


rFLAGS Affected 


None 


Exceptions 


Segment not 
present, #NP Xx The return code segment was marked not present. 
(selector) 


Exception Real| 8086 |Protected Cause of Exception 


Stack, #SS Xx xX Xx 


A memory address exceeded the stack segment limit or was 
non-canonical. 


Stack, #SS Xx 
(selector) 


The return stack segment was marked not present. 
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#CP 


Virtual 
Exception Real| 8086 |Protected Cause of Exception 
General protection, Xx Xx Xx The target offset exceeded the code segment limit or was non- 
#GP canonical. 
Xx The return code selector was a null selector. 
Xx The return stack selector was a null selector and the return 
mode was non-64-bit mode or CPL was 3. 
Xx The return code or stack descriptor exceeded the descriptor 
table limit. 
Xx The return code or stack selector’s TI bit was set but the LDT 
selector was a null selector. 
Xx The segment descriptor for the return code was not a code 
segment. 
Xx The RPL of the return code segment selector was less than 
General protection, the CPL. 
mae The return code segment was non-conforming and the 
x segment selector’s DPL was not equal to the RPL of the code 
segment’s segment selector. 
The return code segment was conforming and the segment 
Xx selector’s DPL was greater than the RPL of the code 
segment’s segment selector. 
Xx The segment descriptor for the return stack was not a writable 
data segment. 
Xx The stack segment descriptor DPL was not equal to the RPL 
of the return code segment selector. 
Xx The stack segment selector RPL was not equal to the RPL of 
the return code segment selector. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned-memory reference was performed while 
#AC alignment checking was enabled. 
The return address on the program stack did not match the 
Control-protection, x address on the shadow stack, or the previous SSP is not 4 


byte aligned. 
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ROL Rotate Left 


Rotates the bits of a register or memory location (first operand) to the left (toward the more significant 
bit positions) by the number of bit positions in an unsigned immediate value or the CL register (second 
operand). The bits rotated out left are rotated back in at the right end (Isb) of the first operand location. 


The processor masks the upper three bits of the count operand, thus restricting the count to a number 
between 0 and 31. When the destination is 64 bits wide, it masks the upper two bits of the count, 
providing a count in the range of 0 to 63. 


After completing the rotation, the instruction sets the CF flag to the last bit rotated out (the Isb of the 
result). For 1-bit rotates, the instruction sets the OF flag to the logical xor of the CF bit (after the 
rotate) and the most significant bit of the result. When the rotate count is greater than 1, the OF flag is 
undefined. When the rotate count is 0, no flags are affected. 


Mnemonic Opcode Description 
ROL reg/mem8, 1 DO /O Rotate an 8-bit register or memory operand left 1 bit. 
Rotate an 8-bit register or memory operand left the 
ROL reg/memé, CL D2 10 number of bits specified in the CL register. 
: : Rotate an 8-bit register or memory operand left the 
ROP eg mens iim CO /0 ib number of bits specified by an 8-bit immediate value. 
ROL reg/mem16, 1 D1 /0 Rotate a 16-bit register or memory operand left 1 bit. 
Rotate a 16-bit register or memory operand left the 
ROL regiments, Cr D3 10 number of bits specified in the CL register. 
. ‘ Rotate a 16-bit register or memory operand left the 
ROL reg/mem16, imm8 C1 10 ib number of bits specified by an 8-bit immediate value. 
ROL reg/mem32, 1 D1 /0 Rotate a 32-bit register or memory operand left 1 bit. 
Rotate a 32-bit register or memory operand left the 
ROL reg/memd?, CL D3 10 number of bits specified in the CL register. 
F j Rotate a 32-bit register or memory operand left the 
BOLTS mCMse nine Chip number of bits specified by an 8-bit immediate value. 
ROL reg/mem64, 1 D1 /0 Rotate a 64-bit register or memory operand left 1 bit. 
ROL reg/mem64, CL D3 /0 Rotate a 64-bit register or memory operand left the 


number of bits specified in the CL register. 


C1/0 ib Rotate a 64-bit register or memory operand left the 


ROL IEG Memes mine number of bits specified by an 8-bit immediate value. 


Related Instructions 


RCL, RCR, ROR 
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rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 | Protected Cause of Exception 

A memory address exceeded the stack segment limit or was 
Stack, #SS . ‘ 7 non-canonical. 

A memory address exceeded a data segment limit or was non- 

X X X 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
x A null data segment was used to reference memory. 

Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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ROR Rotate Right 


Rotates the bits of a register or memory location (first operand) to the right (toward the less significant 
bit positions) by the number of bit positions in an unsigned immediate value or the CL register (second 
operand). The bits rotated out right are rotated back in at the left end (the most significant bit) of the 
first operand location. 


The processor masks the upper three bits of the count operand, thus restricting the count to a number 
between 0 and 31. When the destination is 64 bits wide, the processor masks the upper two bits of the 
count, providing a count in the range of 0 to 63. 


After completing the rotation, the instruction sets the CF flag to the last bit rotated out (the most 
significant bit of the result). For 1-bit rotates, the instruction sets the OF flag to the logical xor of the 
two most significant bits of the result. When the rotate count is greater than 1, the OF flag is undefined. 
When the rotate count is 0, no flags are affected. 


Mnemonic Opcode Description 
ROR reg/memé, 1 DO /1 Rotate an 8-bit register or memory location right 1 bit. 
ROR reg/mem8, CL D2 //1 Rotate an 8-bit register or memory location right the 


number of bits specified in the CL register. 


Rotate an 8-bit register or memory location right the 


ROR reg/memé, immé CO /1 ib number of bits specified by an 8-bit immediate value. 
ROR reg/mem16, 1 D1 /1 Rotate a 16-bit register or memory location right 1 bit. 
Rotate a 16-bit register or memory location right the 
ROR reqmemi Cr Dail number of bits specified in the CL register. 
‘ . Rotate a 16-bit register or memory location right the 
ROR FegMenTtS 1008 GLb number of bits specified by an 8-bit immediate value. 
ROR reg/mem32, 1 D1 /1 Rotate a 32-bit register or memory location right 1 bit. 
Rotate a 32-bit register or memory location right the 
ROR regimeinge, Cl Dott number of bits specified in the CL register. 
: : Rotate a 32-bit register or memory location right the 
ROR FegNnena2 Ua Chil number of bits specified by an 8-bit immediate value. 
ROR reg/mem64, 1 D1 /1 Rotate a 64-bit register or memory location right 1 bit. 
Rotate a 64-bit register or memory operand right the 
ROR regimems4, CL a number of bits specified in the CL register. 
ROR reg/mem64, imm8 C1 /1ib Rotate a 64-bit register or memory operand right the 


number of bits specified by an 8-bit immediate value. 


Related Instructions 


RCL, RCR, ROL 
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rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 

A memory address exceeded the stack segment limit or was 
Stack, #SS . - Zs non-canonical. 

Amemory address exceeded a data segment limit or was non- 

X X X 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 

Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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RORX Rotate Right Extended 


Rotates the bits of the source operand right (toward the least-significant bit) by the number of bit 
positions specified in an immediate operand and writes the result to the destination. Does not affect the 
arithmetic flags. 
This instruction has three operands: 

RORX dest, src, rot_cnt 
On each right-shift, the bit shifted out of the least-significant bit position is copied to the most- 
significant bit. This instruction performs a non-destructive operation; that is, the contents of the source 


operand are unaffected by the operation, unless the destination and source are the same general- 
purpose register. 


In 64-bit mode, the operand size is determined by the value of VEX.W. If VEX.W is 1, the operand 
size is 64 bits; if VEX.W is 0, the operand size is 32 bits. In 32-bit mode, VEX.W is ignored. 16-bit 
operands are not supported. 


The destination (dest) is a general-purpose register and the source (src) is either a general-purpose 
register or a memory operand. The rotate count rot_cnt is encoded in an immediate byte. When the 
operand size is 32, bits [7:5] of the immediate byte are ignored; when the operand size is 64, bits [7:6] 
of the immediate byte are ignored. 


This instruction is a BMI2 instruction. Support for this instruction is indicated by CPUID 
Fn0000_0007_EBX_x0[BMI2] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


VEX RXB.map_select W.vvvv.L.pp Opcode 
RORX reg32, reg/mem32, imm8& C4 RXB.03 0.1111.0.11 FO /r ib 
RORX reg64, reg/mem64, imm& C4 RXB.03 1.1111.0.11 FO /r ib 


Related Instructions 


SARX, SHLX, SHRX 


rFLAGS Affected 


None. 
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Exceptions 
Exception vues Cause of Exception 
P Real| 8086 Protected p 
x Xx BMIZ2 instructions are only recognized in protected mode. 
BMIZ2 instructions are not supported, as indicated by 
Invalid opcode, #UD X | CPUID Fn0000_0007_EBX_x0[BMI2] = 0. 
X VEX.L is 1. 
A memory address exceeded the stack segment limit or 
Stack, #SS . was non-canonical. 
Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 
x A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
An unaligned memory reference was performed while 
Alignment CREEK, 2G . alignment checking was enabled. 
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SAHF Store AH into Flags 


Loads the SF, ZF, AF, PF, and CF flags of the EFLAGS register with values from the corresponding 
bits in the AH register (bits 7, 6, 4, 2, and 0, respectively). The instruction ignores bits 1, 3, and 5 of 
register AH; it sets those bits in the EFLAGS register to 1, 0, and 0, respectively. 


The SAHF instruction is available in 64-bit mode if CPUID Fn8000_0001_ECX[LahfSahf] = 1. Itis 
always available in the other operating modes (including compatibility mode) 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Opcode Description 


Loads the sign flag, the zero flag, the auxiliary flag, the 
SAHF 9E parity flag, and the carry flag from the AH register into 
the lower 8 bits of the EFLAGS register. 


Related Instructions 
LAHF 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M M M M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to O is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 


Virtual 
8086 /Protected Cause of Exception 


nee opcode, The SAHF instruction is not supported in 64-bit mode, as 
indicated by CPUID Fn8000_0001_ECX[LahfSahf] = 0. 
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SHL 


Shifts the bits of a register or memory location (first operand) to the left through the CF bit by the 
number of bit positions in an unsigned immediate value or the CL register (second operand). The 
instruction discards bits shifted out of the CF flag. For each bit shift, the SAL instruction clears the 
least-significant bit to 0. At the end of the shift operation, the CF flag contains the last bit shifted out of 
the first operand. 


The processor masks the upper three bits of the count operand, thus restricting the count to a number 
between 0 and 31. When the destination is 64 bits wide, the processor masks the upper two bits of the 
count, providing a count in the range of 0 to 63. 


The effect of this instruction is multiplication by powers of two. 


For 1-bit shifts, the instruction sets the OF flag to the logical xor of the CF bit (after the shift) and the 
most significant bit of the result. When the shift count is greater than 1, the OF flag is undefined. 


If the shift count is 0, no flags are modified. 


SHL is an alias to the SAL instruction. 


Mnemonic Opcode Description 
SAL reg/mems, 1 DO /4 Shift an 8-bit register or memory location left 1 bit. 
Shift an 8-bit register or memory location left the number 
SAL reg/memé, CL Deis of bits specified in the CL register. 
. . Shift an 8-bit register or memory location left the number 
SAL TEQHDEMS, NS CO of bits specified by an 8-bit immediate value. 
SAL reg/mem16, 1 D1 /4 Shift a 16-bit register or memory location left 1 bit. 
Shift a 16-bit register or memory location left the number 
SATE Qi meME6 CL pe of bits specified in the CL register. 
: : Shift a 16-bit register or memory location left the number 
SAL regiments, imme Clit of bits specified by an 8-bit immediate value. 
SAL reg/mem32, 1 D1 /4 Shift a 32-bit register or memory location left 1 bit. 
Shift a 32-bit register or memory location left the number 
SAL reg/menms32,/CL on of bits specified in the CL register. 
: ; Shift a 32-bit register or memory location left the number 
SAL Teg menig2, mime Chap of bits specified by an 8-bit immediate value. 
SAL reg/mem64, 1 D1 /4 Shift a 64-bit register or memory location left 1 bit. 
SAL reg/mem64, CL D3 /4 Shift a 64-bit register or memory location left the number 


of bits specified in the CL register. 


C1 /4 ib Shift a 64-bit register or memory location left the number 


SAL repimemed, mind of bits specified by an 8-bit immediate value. 
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SHL reg/mems, 1 DO /4 Shift an 8-bit register or memory location by 1 bit. 
Shift an 8-bit register or memory location left the number 
Bn Tegmeme, Cl ve of bits specified in the CL register. 
: ‘ Shift an 8-bit register or memory location left the number 
SHL reg/memé, immé CO /4 ib of bits specified by an 8-bit immediate value. 
SHL reg/mem16, 1 D1 /4 Shift a 16-bit register or memory location left 1 bit. 
Shift a 16-bit register or memory location left the number 
SHL reg/mem16, CL Data of bits specified in the CL register. 
. . Shift a 16-bit register or memory location left the number 
ShiL regimen tG, ima Clip of bits specified by an 8-bit immediate value. 
SHL reg/mem32, 1 D1 /4 Shift a 32-bit register or memory location left 1 bit. 
Shift a 32-bit register or memory location left the number 
SHL reg/mem32, CL D3 /4 of bits specified in the CL register. 
. ‘ Shift a 32-bit register or memory location left the number 
ShLregimems2 nme GL itp of bits specified by an 8-bit immediate value. 
SHL reg/mem64, 1 D1 /4 Shift a 64-bit register or memory location left 1 bit. 
SHL reg/memé64, CL D3 /4 Shift a 64-bit register or memory location left the number 


of bits specified in the CL register. 


; F Shift a 64-bit register or memory location left the number 
SHL reg/mem64, immé C1 /4 ib of bits specified by an 8-bit immediate value. 


Related Instructions 
SAR, SHR, SHLD, SHRD 
rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M M U M M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 
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Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS ‘ i: non-canonical. 
Amemory address exceeded a data segment limit or was non- 
X X X 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 


#AC 


alignment checking was enabled. 
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SAR Shift Arithmetic Right 


Shifts the bits of a register or memory location (first operand) to the right through the CF bit by the 
number of bit positions in an unsigned immediate value or the CL register (second operand). The 
instruction discards bits shifted out of the CF flag. At the end of the shift operation, the CF flag 
contains the last bit shifted out of the first operand. 


The SAR instruction does not change the sign bit of the target operand. For each bit shift, it copies the 
sign bit to the next bit, preserving the sign of the result. 


The processor masks the upper three bits of the count operand, thus restricting the count to a number 
between 0 and 31. When the destination is 64 bits wide, the processor masks the upper two bits of the 
count, providing a count in the range of 0 to 63. 


For 1-bit shifts, the instruction clears the OF flag to 0. When the shift count is greater than 1, the OF 
flag is undefined. 


If the shift count is 0, no flags are modified. 


Although the SAR instruction effectively divides the operand by a power of 2, the behavior is different 
from the IDIV instruction. For example, shifting —11 (FFFFFFF5h) by two bits to the right (that is, 
divide —11 by 4), gives a result of FFFFFFFDh, or —3, whereas the IDIV instruction for dividing —11 
by 4 gives a result of —2. This is because the IDIV instruction rounds off the quotient to zero, whereas 
the SAR instruction rounds off the remainder to zero for positive dividends and to negative infinity for 
negative dividends. So, for positive operands, SAR behaves like the corresponding IDIV instruction. 
For negative operands, it gives the same result if and only if all the shifted-out bits are zeroes; 
otherwise, the result is smaller by 1. 


Mnemonic Opcode Description 
SAR reg/mem8, 1 DO /7 oo a signed 8-bit register or memory operand right 1 
Shift a signed 8-bit register or memory operand right the 
SAR regimes, CL Dew number of bits specified in the CL register. 
F : Shift a signed 8-bit register or memory operand right the 
SBR TOG/EING HIG CO /7 ib number of bits specified by an 8-bit immediate value. 
SAR reg/mem16, 1 D1/7 oan a signed 16-bit register or memory operand right 1 
Shift a signed 16-bit register or memory operand right 
SARL TEgImET 16, (Cl Dat the number of bits specified in the CL register. 
Shift a signed 16-bit register or memory operand right 
SAR reg/mem16, imm& C1 /7 ib the number of bits specified by an 8-bit immediate 
value. 
SAR reg/mem32, 1 D1 /7 Shift a signed 32-bit register or memory location 1 bit. 
SAR reg/mem32, CL D3/7 Shift a signed 32-bit register or memory location right 


the number of bits specified in the CL register. 


318 : General-Purpose 
[AM D Pu blic Use] Instruction Reference 


AMDi@1 


24594—Rev. 3.33—November 2021 


Mnemonic 


SAR reg/mem32, imm8& 


SAR reg/mem64, 1 


SAR reg/mem64, CL 


SAR reg/mem64, imm& 


Related Instructions 


SAL, SHL, SHR, SHLD, SHRD 


Opcode 


C1 /7 ib 


D1 /7 


D3 /7 


C1/7 ib 


AMD64 Technology 


Description 


Shift a signed 32-bit register or memory location right 
the number of bits specified by an 8-bit immediate 
value. 


Shift a signed 64-bit register or memory location right 1 
bit. 

Shift a signed 64-bit register or memory location right 
the number of bits specified in the CL register. 


Shift a signed 64-bit register or memory location right 
the number of bits specified by an 8-bit immediate 
value. 


rFLAGS Affected 

ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M Mi|M|U}|M JM 

21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10] 9 | 8 7 | 6] 4 | 2 0) 


Undefined flags are U. 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS . . ‘ non-canonical. 
Amemory address exceeded a data segment limit or was non- 
Xx X Xx 
; canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 


General-Purpose 
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SARX Shift Right Arithmetic Extended 


Shifts the bits of the first source operand right (toward the least-significant bit) arithmetically by the 
number of bit positions specified in the second source operand and writes the result to the destination. 
Does not affect the arithmetic flags. 
This instruction has three operands: 

SARX dest, src, shft_cnt 


On each right-shift, the most-significant bit (the sign bit) is replicated. This instruction performs a non- 
destructive operation; that is, the contents of the source operand are unaffected by the operation, unless 
the destination and source are the same general-purpose register. 


In 64-bit mode, the operand size is determined by the value of VEX.W. If VEX.W is 1, the operand 
size is 64 bits; if VEX.W is 0, the operand size is 32 bits. In 32-bit mode, VEX.W is ignored. 16-bit 
operands are not supported. 


The destination (dest) is a general-purpose register and the first source (src) is either a general-purpose 
register or amemory operand. The second source operand shft_cnt is a general-purpose register. When 
the operand size is 32, bits [31:5] of shft_cnt are ignored; when the operand size is 64, bits [63:6] of 
shft_cnt are ignored. 


This instruction is a BMI2 instruction. Support for this instruction is indicated by CPUID 
Fn0000_0007_EBX_x0[BMI2] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


VEX RXB.map_select W.vvvv.L.pp Opcode 
SARX reg32, reg/mem32, reg32 C4 RXB.02 0.src2.0.10 F7 Ir 
SARX reg64, reg/mem64, reg64 C4 RXB.02 1.src2.0.10 F7 Ir 


Related Instructions 
RORX, SHLX, SHRX 


rFLAGS Affected 


None. 
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Exceptions 
Exception vues Cause of Exception 
P Real| 8086 Protected p 
x Xx BMIZ2 instructions are only recognized in protected mode. 
BMIZ2 instructions are not supported, as indicated by 
Invalid opcode, #UD X | CPUID Fn0000_0007_EBX_x0[BMI2] = 0. 
X VEX.L is 1. 
A memory address exceeded the stack segment limit or 
Stack, #SS . was non-canonical. 
Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 
x A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
An unaligned memory reference was performed while 
Alignment CREEK, 2G . alignment checking was enabled. 
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SBB Subtract with Borrow 


Subtracts an immediate value or the value in a register or a memory location (second operand) from a 
register or a memory location (first operand), and stores the result in the first operand location. If the 
carry flag (CF) is 1, the instruction subtracts 1 from the result. Otherwise, it operates like SUB. 


The SBB instruction sign-extends immediate value operands to the length of the first operand size. 


This instruction evaluates the result for both signed and unsigned data types and sets the OF and CF 
flags to indicate a borrow in a signed or unsigned result, respectively. It sets the SF flag to indicate the 
sign of a signed result. 


This instruction is useful for multibyte (multiword) numbers because it takes into account the borrow 
from a previous SUB instruction. 


The forms of the SBB instruction that write to memory support the LOCK prefix. For details about the 
LOCK prefix, see “Lock Prefix” on page 11. 


Mnemonic Opcode Description 

SBB AL, imm8 1C ib ee 8-bit value from the AL register 
SBB AX, imm16 ee ee 16-bit value from the AX register 
SBB EAX, imm32 1D id acer iiberoy 32-bit value from the EAX 

SBB RAX, imm32 1D id Subtract a sign-extended immediate 32-bit value from 


the RAX register with borrow. 


80 /3 ib Subtract an immediate 8-bit value from an 8-bit register 


SBE regumems, imine or memory location with borrow. 


: . Subtract an immediate 16-bit value from a 16-bit register 
SBE regunems nntG Bhi iy or memory location with borrow. 


Subtract an immediate 32-bit value from a 32-bit register 


SBB reg/mem32, imm32 B1 /3 id or memory location with borrow. 

. . Subtract a sign-extended immediate 32-bit value from a 
SRB regmemes, mimng2 B1 /3 id 64-bit register or memory location with borrow. 
SBB reg/mem16, imm8 83 /3 ib Subtract a sign-extended 8-bit immediate value from a 


16-bit register or memory location with borrow. 


Subtract a sign-extended 8-bit immediate value from a 


SBE TEU/memise, me. B3 /3 ib 32-bit register or memory location with borrow. 


83 /3 ib Subtract a sign-extended 8-bit immediate value from a 


SEB regnemes, imme 64-bit register or memory location with borrow. 


18 /r Subtract the contents of an 8-bit register from an 8-bit 


SEB reg/mems, rega register or memory location with borrow. 


Subtract the contents of a 16-bit register from a 16-bit 


SBB reg/mem16, regi6 19/7 register or memory location with borrow. 
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Mnemonic Opcode Description 
Subtract the contents of a 32-bit register from a 32-bit 
SEB regimems2, reg32 eu register or memory location with borrow. 
Subtract the contents of a 64-bit register from a 64-bit 
SBB reg/mem64, reg64 19 /r register or memory location with borrow. 
Subtract the contents of an 8-bit register or memory 
SBB reg8, reg/mem8s 1A/r location from the contents of an 8-bit register with 
borrow. 
Subtract the contents of a 16-bit register or memory 
SBB reg16, reg/mem16 1B /r location from the contents of a 16-bit register with 
borrow. 
Subtract the contents of a 32-bit register or memory 
SBB reg32, reg/mem32 1B/r location from the contents of a 32-bit register with 
borrow. 
Subtract the contents of a 64-bit register or memory 
SBB reg64, reg/mem64 1B/r location from the contents of a 64-bit register with 
borrow. 
Related Instructions 
SUB, ADD, ADC 
rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF |) SF | ZF | AF | PF | CF 
M M M M M M 
21; 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 | Protected Cause of Exception 

A memory address exceeded the stack segment limit or was 
Stack, #SS ‘ . 7 non-canonical. 

A memory address exceeded a data segment limit or was non- 

X X Xx 
; canonical. 
General protection, 
#GP x The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 

Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 


General-Purpose 
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SCAS Scan String 
SCASB 
SCASW 
SCASD 
SCASQ 


Compares the AL, AX, EAX, or RAX register with the byte, word, doubleword, or quadword pointed 
to by ES:rDI, sets the status flags in the rFLAGS register according to the results, and then increments 
or decrements the rDI register according to the state of the DF flag in the rFLAGS register. 


If the DF flag is 0, the instruction increments the rDI register; otherwise, it decrements it. The 
instruction increments or decrements the rDI register by 1, 2, 4, or 8, depending on the size of the 
operands. 


The forms of the SCASx instruction with an explicit operand address the operand at ES:rDI. The 
explicit operand serves only to specify the size of the values being compared. 


The no-operands forms of the instruction use the ES:rDI registers to point to the value to be compared. 
The mnemonic determines the size of the operands and the specific register containing the other 
comparison value. 


For block comparisons, the SCASx instructions support the REPE or REPZ prefixes (they are 
synonyms) and the REPNE or REPNZ prefixes (they are synonyms). For details about the REP 
prefixes, see “Repeat Prefixes” on page 12. A SCASx instruction can also operate inside a loop 
controlled by the LOOPcc instruction. 


Mnemonic Opcode Description 


Compare the contents of the AL register with the byte at 


SCAS memé AE ES:rDI, and then increment or decrement rDI. 


Compare the contents of the AX register with the word 


SUpeients oF at ES:rDl, and then increment or decrement rDI. 
Compare the contents of the EAX register with the 
SCAS mem32 AF doubleword at ES:rDI, and then increment or decrement 
rDl. 
Compare the contents of the RAX register with the 
SCAS mem64 AF quadword at ES:rDI, and then increment or decrement 
rDl. 
Compare the contents of the AL register with the byte at 
aChor oe ES:rDl, and then increment or decrement rDI. 
SCASW AF Compare the contents of the AX register with the word 


at ES:rDI, and then increment or decrement rDI. 
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Mnemonic Opcode Description 
Compare the contents of the EAX register with the 
SCASD AF doubleword at ES:rDI, and then increment or decrement 
rDI. 
Compare the contents of the RAX register with the 
SCASQ AF quadword at ES:rDI, and then increment or decrement 
rDl. 
Related Instructions 
CMP, CMPSx 
rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF |) SF | ZF | AF | PF | CF 
M M M M M M 
21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to O is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 

Xx A null ES segment was used to reference memory. 
General protection, — 
#GP Xx Xx Xx A memory address exceeded the ES segment limit or was 

non-canonical. 

Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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SETcc Set Byte on Condition 


Checks the status flags in the rFLAGS register and, if the flags meet the condition specified in the 
mnemonic (cc), sets the value in the specified 8-bit memory location or register to 1. If the flags do not 
meet the specified condition, SETcc clears the memory location or register to 0. 


Mnemonics with the A (above) and B (below) tags are intended for use when performing unsigned 
integer comparisons; those with G (greater) and L (less) tags are intended for use with signed integer 
comparisons. 


Software typically uses the SETcc instructions to set logical indicators. Like the CMOVcc instructions 
(page 152), the SETcc instructions can replace two instructions—a conditional jump and a move. 
Replacing conditional jumps with conditional sets can help avoid branch-prediction penalties that may 
result from conditional jumps. 


If the logical value “true” (logical one) is represented in a high-level language as an integer with all 
bits set to 1, software can accomplish such representation by first executing the opposite SETcc 
instruction—for example, the opposite of SETZ is SETNZ—and then decrementing the result. 


A ModR/M byte is used to identify the operand. The reg field in the ModR/M byte is unused. 


Mnemonic Opcode Description 

SETO reg/mem8& OF 90 /O Set byte if overflow (OF = 1). 
SETNO reg/mem8& OF 91 /O Set byte if not overflow (OF = 0). 
SETB reg/mem8 Set byte if below (CF = 1). 

SETC reg/mem8s OF 92 /0 Set byte if carry (CF = 1). 

SETNAE reg/mem8& Set byte if not above or equal (CF = 1). 
SETNB reg/mem8& Set byte if not below (CF = 0). 
SETNC reg/mem8 OF 93 /0 Set byte if not carry (CF = 0). 
SETAE reg/mem8s Set byte if above or equal (CF = 0). 
SETZ reg/mem8s OF 94 /0 Set byte if zero (ZF = 1). 

SETE reg/mem8 Set byte if equal (ZF = 1). 

SETNZ reg/mem8& OF 95 /0 Set byte if not zero (ZF = 0). 


SETNE reg/mem8& Set byte if not equal (ZF = 0). 


SETBE reg/mem8& OF 96 /0 Set byte if below or equal (CF = 1 or ZF = 1). 
SETNA reg/mem8& Set byte if not above (CF = 1 or ZF = 1). 

SETNBE reg/mem8& OF 97 /0 Set byte if not below or equal (CF = 0 and ZF = 0). 
SETA reg/mem8& Set byte if above (CF = 0 and ZF = 0). 

SETS reg/mem8 OF 98 /O Set byte if sign (SF = 1). 

SETNS reg/mem8& OF 99 /O Set byte if not sign (SF = 0). 

SETP reg/mem8 OF 9A /0 Set byte if parity (PF = 1). 

SETPE reg/mem8s Set byte if parity even (PF = 1). 

SETNP reg/mem8s OF 9B /0 Set byte if not parity (PF = 0). 


SETPO reg/mem8s Set byte if parity odd (PF = 0). 
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Mnemonic Opcode Description 
SETL reg/mem8s OF 9C /O Set byte if less (SF <> OF). 
SETNGE reg/mem8 Set byte if not greater or equal (SF <> OF). 
SETNL reg/memé8 OF 9D /o Set byte if not less (SF = OF). 
SETGE reg/mem8 Set byte if greater or equal (SF = OF). 
SETLE reg/mem8 OF 9EF /0 Set byte if less or equal (ZF = 1 or SF <> OF). 
SETNG reg/mem8& Set byte if not greater (ZF = 1 or SF <> OF). 
SETNLE reg/mem8 OF 9F /0 Set byte if not less or equal (ZF = 0 and SF = OF). 
SETG reg/mem8s Set byte if greater (ZF = 0 and SF = OF). 
Related Instructions 
None 
rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 | Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS ‘ ‘ - non-canonical. 
Amemory address exceeded a data segment limit or was non- 
X X X 
; canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Page fault, #PF x x A page fault resulted from the execution of the instruction. 
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SFENCE Store Fence 


Acts as a barrier to force strong memory ordering (serialization) between store instructions preceding 
the SFENCE and store instructions that follow the SFENCE. Stores to differing memory types, or 
within the WC memory type, may become visible out of program order; the SFENCE instruction 
ensures that the system completes all previous stores in such a way that they are globally visible before 
executing subsequent stores. This includes emptying the store buffer and all write-combining buffers. 


The SFENCE instruction is weakly-ordered with respect to load instructions, data and instruction 
prefetches, and the LFENCE instruction. Speculative loads initiated by the processor, or specified 
explicitly using cache-prefetch instructions, can be reordered around an SFENCE. 


In addition to store instructions, SFENCE is strongly ordered with respect to other SFENCE 
instructions, MFENCE instructions, and serializing instructions. Further details on the use of 
MFENCE to order accesses among differing memory types may be found in AMD64 Architecture 
Programmer’s Manual Volume 2: System Programming, section 7.4 “Memory Types” on page 172. 


The SFENCE instruction is an SSE1 instruction. Support for SSE1 instructions is indicated by CPUID 
Fn0000_0001_EDX[SSE] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Opcode Description 
SFENCE OF AE F8 Force strong ordering of (Serialized) store operations. 


Related Instructions 


LFENCE, MFENCE, MCOMMIT 


rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 


The SSE instructions are not supported, as indicated by 
Invalid Opcode, Xx Xx x CPUID Fn0000_0001_ EDX[25]=0; and the AMD extensions to 
#UD MMxX are not supported, as indicated by CPUID 
Fn8000_0001_EDX[22]=0. 
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SHL Shift Left 


This instruction is synonymous with the SAL instruction. For information, see “SAL SHL” on 
page 315. 
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SHLD Shift Left Double 


Shifts the bits of a register or memory location (first operand) to the left by the number of bit positions 
in an unsigned immediate value or the CL register (third operand), and shifts in a bit pattern (second 
operand) from the right. At the end of the shift operation, the CF flag contains the last bit shifted out of 
the first operand. 


The processor masks the upper three bits of the count operand, thus restricting the count to a number 
between 0 and 31. When the destination is 64 bits wide, the processor masks the upper two bits of the 
count, providing a count in the range of 0 to 63. If the masked count is greater than the operand size, 
the result in the destination register is undefined. 


If the shift count is 0, no flags are modified. 


If the count is 1 and the sign of the operand being shifted changes, the instruction sets the OF flag to 1. 
If the count is greater than 1, OF is undefined. 


Mnemonic Opcode Description 


Shift bits of a 16-bit destination register or memory 
: : operand to the left the number of bits specified in an 8- 
SHED iegMnemte seg hema: “Oe Rania bit immediate value, while shifting in bits from the 
second operand. 


Shift bits of a 16-bit destination register or memory 
SHLD reg/mem16, reg16, CL OF A5 /r operand to the left the number of bits specified in the CL 
register, while shifting in bits from the second operand. 


Shift bits of a 32-bit destination register or memory 
: . operand to the left the number of bits specified in an 8- 
SHLD regmems2,16982,imme, “OR Barth bit immediate value, while shifting in bits from the 
second operand. 


Shift bits of a 32-bit destination register or memory 
SHLD reg/mem32, reg32, CL OF A5 /r operand to the left the number of bits specified in the CL 
register, while shifting in bits from the second operand. 


Shift bits of a 64-bit destination register or memory 

OF A4 /r ib operand to the left the number of bits specified in an 8- 
bit immediate value, while shifting in bits from the 
second operand. 


SHLD reg/mem64, reg64, imm8& 


Shift bits of a 64-bit destination register or memory 
SHLD reg/mem64, reg64, CL OF A5 /r operand to the left the number of bits specified in the CL 
register, while shifting in bits from the second operand. 


Related Instructions 


SHRD, SAL, SAR, SHR, SHL 
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rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M M U M M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 | Protected Cause of Exception 

A memory address exceeded the stack segment limit or was 
Stack, #SS . - ‘ non-canonical. 

Amemory address exceeded a data segment limit or was non- 

X X X , 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 

Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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SHLX Shift Left Logical Extended 


Shifts the bits of the first source operand left (toward the most-significant bit) by the number of bit 
positions specified in the second source operand and writes the result to the destination. Does not 
affect the arithmetic flags. 
This instruction has three operands: 

SHLX dest, src, shft_cnt 


On each left-shift, a zero is shifted into the least-significant bit position. This instruction performs a 
non-destructive operation; that is, the contents of the source operand are unaffected by the operation, 
unless the destination and source are the same general-purpose register. 


In 64-bit mode, the operand size is determined by the value of VEX.W. If VEX.W is 1, the operand 
size is 64 bits; if VEX.W is 0, the operand size is 32 bits. In 32-bit mode, VEX.W is ignored. 16-bit 
operands are not supported. 


The destination (dest) is a general-purpose register and the first source (src) is either a general-purpose 
register or amemory operand. The second source operand shft_cnt is a general-purpose register. When 
the operand size is 32, bits [31:5] of shft_cnt are ignored; when the operand size is 64, bits [63:6] of 
shft_cnt are ignored. 


This instruction is a BMI2 instruction. Support for this instruction is indicated by CPUID 
Fn0000_0007_EBX_x0[BMI2] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


VEX RXB.map_select W.vvvv.L.pp Opcode 
SHLX reg32, reg/mem32, reg32 C4 RXB.02 0.src2.0.01 F7 Ir 
SHLX reg64, reg/mem64, reg64 C4 RXB.02 1.src2.0.01 F7 /r 


Related Instructions 
RORX, SARX, SHRX 


rFLAGS Affected 


None. 
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Exceptions 
Exception vues Cause of Exception 
P Real| 8086 Protected p 
x Xx BMIZ2 instructions are only recognized in protected mode. 
BMIZ2 instructions are not supported, as indicated by 
Invalid opcode, #UD X | CPUID Fn0000_0007_EBX_x0[BMI2] = 0. 
X VEX.L is 1. 
A memory address exceeded the stack segment limit or 
Stack, #SS . was non-canonical. 
Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 
x A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
An unaligned memory reference was performed while 
Alignment CREEK, 2G . alignment checking was enabled. 
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SHR Shift Right 


Shifts the bits of a register or memory location (first operand) to the right through the CF bit by the 
number of bit positions in an unsigned immediate value or the CL register (second operand). The 
instruction discards bits shifted out of the CF flag. At the end of the shift operation, the CF flag 
contains the last bit shifted out of the first operand. 


For each bit shift, the instruction clears the most-significant bit to 0. 
The effect of this instruction is unsigned division by powers of two. 


The processor masks the upper three bits of the count operand, thus restricting the count to a number 
between 0 and 31. When the destination is 64 bits wide, the processor masks the upper two bits of the 
count, providing a count in the range of 0 to 63. 


For 1-bit shifts, the instruction sets the OF flag to the most-significant bit of the original value. If the 
count is greater than 1, the OF flag is undefined. 


If the shift count is 0, no flags are modified. 


Mnemonic Opcode Description 
SHR reg/memé8, 1 DO /5 Shift an 8-bit register or memory operand right 1 bit. 
Shift an 8-bit register or memory operand right the 
Shik reg/mems, Ol: D2i5 number of bits specified in the CL register. 
: : Shift an 8-bit register or memory operand right the 
SHR EONS Aang CO /5 ib number of bits specified by an 8-bit immediate value. 
SHR reg/mem16, 1 D1 /5 Shift a 16-bit register or memory operand right 1 bit. 
Shift a 16-bit register or memory operand right the 
SHR regmemte CL D3 iS number of bits specified in the CL register. 
; F Shift a 16-bit register or memory operand right the 
SH Tegine mT imine C1/5 ib number of bits specified by an 8-bit immediate value. 
SHR reg/mem32, 1 D1/5 Shift a 32-bit register or memory operand right 1 bit. 
SHR reg/mem32, CL D3 /5 Shift a 32-bit register or memory operand right the 


number of bits specified in the CL register. 


, F Shift a 32-bit register or memory operand right the 
BER IEQIMEM 221mg EL number of bits specified by an 8-bit immediate value. 
SHR reg/mem64, 1 D1/5 Shift a 64-bit register or memory operand right 1 bit. 
SHR reg/mem64, CL D3 /5 Shift a 64-bit register or memory operand right the 


number of bits specified in the CL register. 


. F Shift a 64-bit register or memory operand right the 
SHR reg/mem64, imm8 C115 ib number of bits specified by an 8-bit immediate value. 
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Related Instructions 


SHL, SAL, SAR, SHLD, SHRD 


AMD64 Technology 


rFLAGS Affected 

ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M;}|M}{|U}M 4M 

21 | 20 | 19 | 18 | 17 | 16 | 14 13712 11 | 10 | 9 8 | 7 6 | 4 | 2] 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS a A non-canonical. 
Amemory address exceeded a data segment limit or was non- 
X X 
; canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
x A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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SHRD Shift Right Double 


Shifts the bits of a register or memory location (first operand) to the right by the number of bit 
positions in an unsigned immediate value or the CL register (third operand), and shifts in a bit pattern 
(second operand) from the left. At the end of the shift operation, the CF flag contains the last bit shifted 
out of the first operand. 


The processor masks the upper three bits of the count operand, thus restricting the count to a number 
between 0 and 31. When the destination is 64 bits wide, the processor masks the upper two bits of the 
count, providing a count in the range of 0 to 63. If the masked count is greater than the operand size, 
the result in the destination register is undefined. 


If the shift count is 0, no flags are modified. 


If the count is 1 and the sign of the value being shifted changes, the instruction sets the OF flag to 1. If 
the count is greater than 1, the OF flag is undefined. 


Mnemonic Opcode Description 


Shift bits of a 16-bit destination register or memory 
operand to the right the number of bits specified in an 8- 
bit immediate value, while shifting in bits from the 
second operand. 


SHRD reg/mem16, regi6, imm8 OF AC /rib 


Shift bits of a 16-bit destination register or memory 

OF AD /r operand to the right the number of bits specified in the 
CL register, while shifting in bits from the second 
operand. 


SHRD reg/mem16, regi6, CL 


Shift bits of a 32-bit destination register or memory 
operand to the right the number of bits specified in an 8- 
bit immediate value, while shifting in bits from the 
second operand. 


SHRD reg/mem32, reg32, imm8 OF AC /r ib 


Shift bits of a 32-bit destination register or memory 
operand to the right the number of bits specified in the 
CL register, while shifting in bits from the second 
operand. 


SHRD reg/mem32, reg32, CL OF AD /r 


Shift bits of a 64-bit destination register or memory 
operand to the right the number of bits specified in an 8- 
bit immediate value, while shifting in bits from the 
second operand. 


SHRD reg/mem64, reg64, imm8 OF AC /r ib 


Shift bits of a 64-bit destination register or memory 
operand to the right the number of bits specified in the 
CL register, while shifting in bits from the second 
operand. 


SHRD reg/mem64, reg64, CL OF AD /r 


Related Instructions 


SHLD, SHR, SHL, SAR, SAL 
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rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M M U M M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 [Protected Cause of Exception 

A memory address exceeded the stack segment limit or was 
Stack, #SS . ss . non-canonical. 

Amemory address exceeded a data segment limit or was non- 

X X X 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 

Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx x An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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SHRX Shift Right Logical Extended 


Shifts the bits of the first source operand right (toward the least-significant bit) by the number of bit 
positions specified in the second source operand and writes the result to the destination. Does not 
affect the arithmetic flags. 
This instruction has three operands: 

SHRX dest, src, shft_cnt 


On each right-shift, a zero is shifted into the most-significant bit position. This instruction performs a 
non-destructive operation; that is, the contents of the source operand are unaffected by the operation, 
unless the destination and source are the same general-purpose register. 


In 64-bit mode, the operand size is determined by the value of VEX.W. If VEX.W is 1, the operand 
size is 64 bits; if VEX.W is 0, the operand size is 32 bits. In 32-bit mode, VEX.W is ignored. 16-bit 
operands are not supported. 


The destination (dest) is a general-purpose register and the first source (src) is either a general-purpose 
register or amemory operand. The second source operand shft_cnt is a general-purpose register. When 
the operand size is 32, bits [31:5] of shft_cnt are ignored; when the operand size is 64, bits [63:6] of 
shft_cnt are ignored. 


This instruction is a BMI2 instruction. Support for this instruction is indicated by CPUID 
Fn0000_0007_EBX_x0[BMI2] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


VEX RXB.map_select W.vvvv.L.pp Opcode 
SHRX reg32, reg/mem32, reg32 C4 RXB.02 0.src2.0.11 F7 /r 
SHRX reg64, reg/mem64, reg64 C4 RXB.02 1.src2.0.11 F7 /r 


Related Instructions 
RORX, SARX, SHLX 


rFLAGS Affected 


None. 
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Exceptions 
Exception vues Cause of Exception 
P Real| 8086 Protected p 
x Xx BMIZ2 instructions are only recognized in protected mode. 
BMIZ2 instructions are not supported, as indicated by 
Invalid opcode, #UD X | CPUID Fn0000_0007_EBX_x0[BMI2] = 0. 
X VEX.L is 1. 
A memory address exceeded the stack segment limit or 
Stack, #SS . was non-canonical. 
Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 
x A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
An unaligned memory reference was performed while 
Alignment CREEK, 2G . alignment checking was enabled. 
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Address 


Flushes Lightweight Profiling (LWP) state to memory and returns the current effective address of the 
Lightweight Profiling Control Block (LWPCB) in the specified register. The LWPCB address returned 
is truncated to 32 bits if the operand size is 32. 


If LWP is not currently enabled, SLWPCB sets the specified register to zero. 


The flush operation stores the internal event counters for active events and the current ring buffer head 
pointer into the LWPCB. If there is an unwritten event record pending, it is written to the event ring 
buffer. 


The LWP_CBADDR MSR holds the linear address of the current LWPCB. If the contents of 
LWP_CBADDBR is not zero, the value returned in the specified register is an effective address that is 
calculated by subtracting the current DS.Base address from the linear address kept in LWP_CBADDR. 
Note that if DS has changed between the time LLWPCB was executed and the time SLWPCB is 
executed, this might result in an address that is not currently accessible by the application. 


SLWPCB generates an invalid opcode exception (#UD) if the machine is not in protected mode or if 
LWP is not available. 


It is possible to execute SLWPCB when the CPL != 3 or when SMM is active, but if the LWPCB 
pointer is not zero, system software must ensure that the LWPCB and the entire ring buffer are 
properly mapped into writable memory in order to avoid a #PF fault. Using SLWPCB in these 
situations is not recommended. 


See the discussion of lightweight profiling in Volume 2, Chapter 13 for more information on the use of 
the LLWPCB, SLWPCB, LWPINS, and LWPVAL instructions. 


The SLWPCB instruction is implemented if LWP is supported on a processor. Support for LWP is 
indicated by CPUID Fn8000_0001_ECX[LWP] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Instruction Encoding 


Mnemonic Encoding 
XOP RXB.map_select W.vvvv.L.pp Opcode 
SLWPCB reg32 8F RXB.09 0.1111.0.00 12/1 
SLWPCB reg64 8F RXB.09 1.1111.0.00 12/1 


ModRM.reg augments the opcode and is assigned the value 001b. ModRM.r/m (augmented by 
XOP.R) specifies the register in which to put the LWPCB address. ModRM.mod must be 11b. 
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Related Instructions 


LLWPCB, LWPINS, LWPVAL 


AMD64 Technology 


rFLAGS Affected 
None 
Exceptions 
F Virtual : 
Exception Real! 8086 |Protected Cause of Exception 
X X X The SLWPCEB instruction is not supported, as indicated by 
Invalid opcode CPUID Fn8000_0001_ECX[LWP] = 0. 
#UD x x The system is not in protected mode. 
Xx LWP is not available, or mod != 11b, or vwwwv != 1111b. 
Xx A page fault resulted from reading or writing the LWPCB. 
Page fault, #PF 
Xx A page fault resulted from flushing an event to the ring buffer. 
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STC Set Carry Flag 
Sets the carry flag (CF) in the rFLAGS register to one. 


Mnemonic Opcode Description 
STC F9 Set the carry flag (CF) to one. 
Related Instructions 
CLC, CMC 
rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
1 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 


None 
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STD Set Direction Flag 


Set the direction flag (DF) in the rFLAGS register to 1. If the DF flag is 0, each iteration of a string 
instruction increments the data pointer (index registers rSI or rD]). If the DF flag is 1, the string 
instruction decrements the pointer. Use the CLD instruction before a string instruction to make the 
data pointer increment. 


Mnemonic Opcode Description 


STD FD Set the direction flag (DF) to one. 
Related Instructions 
CLD, INSx, LODSx, MOVSx, OUTSx, SCASx, STOSx, CMPSx 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
1. 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 


None 
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Store String 


Copies a byte, word, doubleword, or quadword from the AL, AX, EAX, or RAX registers to the 
memory location pointed to by ES:rDI and increments or decrements the rDI register according to the 


state of the DF flag in the rFLAGS register. 


If the DF flag is 0, the instruction increments the pointer; otherwise, it decrements the pointer. It 
increments or decrements the pointer by 1, 2, 4, or 8, depending on the size of the value being copied. 


The forms of the STOSx instruction with an explicit operand use the operand only to specify the type 


(size) of the value being copied. 


The no-operands forms specify the type (size) of the value being copied with the mnemonic. 


The STOSx instructions support the REP prefixes. For details about the REP prefixes, see “Repeat 
Prefixes” on page 12. The STOSx instructions can also operate inside a LOOPcc instruction. 


Mnemonic 


STOS memé8 
STOS mem16 
STOS mem32 
STOS mem64 
STOSB 
STOSW 
STOSD 
STOSQ 


Related Instructions 


LODSx, MOVSx 


Opcode 


AA 


AB 


AB 


AB 


AA 


AB 


AB 


AB 


Description 


Store the contents of the AL register to ES:rDI, and then 
increment or decrement rDI. 


Store the contents of the AX register to ES:rDI, and then 
increment or decrement rDI. 


Store the contents of the EAX register to ES:rDI, and 
then increment or decrement rDI. 


Store the contents of the RAX register to ES:rDI, and 
then increment or decrement rDI. 


Store the contents of the AL register to ES:rDI, and then 
increment or decrement rDI. 


Store the contents of the AX register to ES:rDI, and then 
increment or decrement rDI. 


Store the contents of the EAX register to ES:rDI, and 
then increment or decrement rDI. 


Store the contents of the RAX register to ES:rDI, and 
then increment or decrement rDI. 
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rFLAGS Affected 
None 

Exceptions 

Virtual 
Exception Real| 8086 | Protected Cause of Exception 
Xx Xx Xx A memory address exceeded the ES segment limit or was 
non-canonical. 
General protection, 
#GP Xx The ES segment was a non-writable segment. 
Xx A null ES segment was used to reference memory. 

Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 


#AC 


alignment checking was enabled. 
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SUB Subtract 


Subtracts an immediate value or the value in a register or memory location (second operand) from a 
register or a memory location (first operand) and stores the result in the first operand location. An 
immediate value is sign-extended to the length of the first operand. 


This instruction evaluates the result for both signed and unsigned data types and sets the OF and CF 
flags to indicate a borrow in a signed or unsigned result, respectively. It sets the SF flag to indicate the 
sign of a signed result. 


The forms of the SUB instruction that write to memory support the LOCK prefix. For details about the 
LOCK prefix, see “Lock Prefix” on page 11. 


Mnemonic Opcode Description 


; : Subtract an immediate 8-bit value from the AL register 
ee 2C ib and store the result in AL. 


; ; Subtract an immediate 16-bit value from the AX register 
SHE mins eo and store the result in AX. 


. . Subtract an immediate 32-bit value from the EAX 
SE Bek inma2 2D ig register and store the result in EAX. 


: : Subtract a sign-extended immediate 32-bit value from 
ae Panes 210 the RAX register and store the result in RAX. 


Subtract an immediate 8-bit value from an 8-bit 


SUB eomems; Inns BO /5 ib destination register or memory location. 

SUB regimemi6,inmié ei Siw ——_—Subact an immetiate 16-bit value from a 16-bit 
SUB regimems2,imms2_ i fSid_——_—Subact an immetiate 22-bit value frm a 32-bit 
SUB regimems4,imms2 aL Sid —«Subactasign-extended immediate 92-bit value from a 
SUB reg/mem16, imm8 83 /5 ib Subtract a sign-extended immediate 8-bit value from a 


16-bit register or memory location. 


83 /5 ib Subtract a sign-extended immediate 8-bit value from a 


SUE reg/mems2, imma 32-bit register or memory location. 


. ‘ Subtract a sign-extended immediate 8-bit value from a 
SUB reg/mem64 mma B3 /5 ib 64-bit register or memory location. 


Subtract the contents of an 8-bit register from an 8-bit 
SUB reg/memé, reg8 2o/t destination register or memory location. 


Subtract the contents of a 16-bit register from a 16-bit 
SUB reg/mem16, regi6 2a destination register or memory location. 


Subtract the contents of a 32-bit register from a 32-bit 
SUB reg/memse, feg32 enn destination register or memory location. 


Subtract the contents of a 64-bit register from a 64-bit 
SUB regimemes .reges ath destination register or memory location. 
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Mnemonic Opcode Description 
Subtract the contents of an 8-bit register or memory 
SUB regs, regimes ene operand from an 8-bit destination register. 
Subtract the contents of a 16-bit register or memory 
SUB regi6, reg/mem16 2B /r operand from a 16-bit destination register. 
Subtract the contents of a 32-bit register or memory 
SUB reg32, reg/mem32 2B /r operand from a 32-bit destination register. 
Subtract the contents of a 64-bit register or memory 
SUB Teg04 TeQmneMmGs 2B i operand from a 64-bit destination register. 
Related Instructions 
ADC, ADD, SBB 
rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M M M M M 
21; 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to O is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 

A memory address exceeded the stack segment limit or was 

Stack, #SS ‘ ‘ - non-canonical. 
A memory address exceeded a data segment limit or was non- 

X X X 
; canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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TIMSKC Inverse Mask From Trailing Ones 
Finds the least significant zero bit in the source operand, clears all bits below that bit to 0, sets all other 
bits to 1 (including the found bit) and writes the result to the destination. If the least significant bit of 
the source operand is 0, the destination is written with all ones. 
This instruction has two operands: 

TIMSKC dest, src 


In 64-bit mode, the operand size is determined by the value of XOP.W. If XOP.W is 1, the operand size 
is 64-bit; if XOP.W is 0, the operand size is 32-bit. In 32-bit mode, XOP.W is ignored. 16-bit operands 
are not supported. 


The destination (dest) is a general purpose register. 
The source operand (src) is a general purpose register or a memory operand. 


The T1MSKC instruction effectively performs a bit-wise logical or of the inverse of the source 
operand and the result of incrementing the source operand by 1 and stores the result to the destination 
register: 


add tmpi, src, 1 
not tmp2, src 
or dest, tmp1, tmp2 


The value of the carry flag of rFLAGs is generated by the add pseudo-instruction and the remaining 
arithmetic flags are generated by the or pseudo-instruction. 


The T1MSKC instruction is a TBM instruction. Support for this instruction is indicated by CPUID 
Fn8000_0001_ECX[TBM] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


XOP RXB.map_select W.vvvv.L.pp Opcode 
T1IMSKC reg32, reg/mem32 8F RXB.09 0.dest.0.00 01 /7 
TLMSKC reg64, reg/mem64 8F RXB.09 1.dest.0.00 01/7 


Related Instructions 


ANDN, BEXTR, BLCFILL, BLCI, BLCIC, BLCMSK, BLCS, BLSFILL, BLSI, BLSIC, BLSR, 
BLSMSK, BSF, BSR, LZCNT, POPCNT, TZMSK, TZCNT 
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rFLAGS Affected 

ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0) Mi|M|U}|UJM 

21 | 20 | 19 | 18 | 17 | 16 | 14 | 13 | 12 | 11 | 10] 9 8 | 7 6 | 4 | 2] 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Exception ve Cause of Exception 
p Real| 8086 /Protected p 
Xx Xx TBM instructions are only recognized in protected mode. 
TBM instructions are not supported, as indicated by 
Invalid opcode, #UD X | CPUID Fn8000_0001_ECX{TBM] = 0. 
X XOP.L is 1. 
A memory address exceeded the stack segment limit or 
Stack, #SS ‘ was non-canonical. 
Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
Alignment check, #AC Xx An unaligned memory reference was performed while 


alignment checking was enabled. 
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Test Bits 


Performs a bit-wise logical and on the value in a register or memory location (first operand) with an 
immediate value or the value in a register (second operand) and sets the flags in the rFLAGS register 


based on the result. 


This instruction has two operands: 


TEST dest, src 


While the AND instruction changes the contents of the destination and the flag bits, the TEST 
instruction changes only the flag bits. 


Mnemonic 


TEST AL, immé 


TEST AX, imm1i6 


TEST EAX, imm32 


TEST RAX, imm32 


TEST reg/mem8, imm8& 


TEST reg/mem16, imm16 


TEST reg/mem32, imm32 


TEST reg/mem64, imm32 


TEST reg/memsé, reg8 


TEST reg/mem16, reg16 


TEST reg/mem32, reg32 


TEST reg/mem64, reg64 


Related Instructions 


AND, CMP 


Opcode 


A8 ib 


AQ iw 


AQ id 


AQ id 


F6 /0 ib 


F7 /0 iw 


F7 /0 id 


F7 /0 id 


84 /r 


85 /r 


85 /r 


85 /r 


Description 


and an immediate 8-bit value with the contents of the AL 
register and set rFLAGS to reflect the result. 


and an immediate 16-bit value with the contents of the AX 
register and set rFLAGS to reflect the result. 


and an immediate 32-bit value with the contents of the EAX 
register and set rFLAGS to reflect the result. 


and a sign-extended immediate 32-bit value with the contents 
of the RAX register and set rFLAGS to reflect the result. 


and an immediate 8-bit value with the contents of an 8-bit 
register or memory operand and set rFLAGS to reflect the result. 


and an immediate 16-bit value with the contents of a 16-bit 
register or memory operand and set rFLAGS to reflect the result. 


and an immediate 32-bit value with the contents of a 32-bit 
register or memory operand and set rFLAGS to reflect the result. 


and asign-extended immediate32-bit value with the contents of 
a 64-bit register or memory operand and set rFLAGS to reflect 
the result. 


and the contents of an 8-bit register with the contents of an 8-bit 
register or memory operand and set rFLAGS to reflect the result. 


and the contents of a 16-bit register with the contents of a 16-bit 
register or memory operand and set rFLAGS to reflect the result. 


and the contents of a 32-bit register with the contents of a 32-bit 
register or memory operand and set rFLAGS to reflect the result. 


and the contents of a 64-bit register with the contents of a 64-bit 
register or memory operand and set rFLAGS to reflect the result. 
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rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0 M M U M 0 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 

A memory address exceeded the stack segment limit or was 
Stack, #SS . ‘ : non-canonical. 

Amemory address exceeded a data segment limit or was non- 

X X X : 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
x A null data segment was used to reference memory. 

Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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TZCNT Count Trailing Zeros 


Counts the number of trailing zero bits in the 16-, 32-, or 64-bit general purpose register or memory 
source operand. Counting starts upward from the least significant bit and stops when the lowest bit 
having a value of 1 is encountered or when the most significant bit is encountered. The count is written 
to the destination register. 


If the input operand is zero, CF is set to 1 and the size (in bits) of the input operand is written to the 
destination register. Otherwise, CF is cleared. 


If the least significant bit is a one, the ZF flag is set to 1 and zero is written to the destination register. 
Otherwise, ZF is cleared. 


TZCNT is a BMI instruction. Support for BMI instructions is indicated by CPUID 
Fn0000_0007_EBX_x0[BMI] = 1. If the TZCNT instruction is not available, the encoding is treated 
as the BSF instruction. Software must check the CPUID bit once per program or library initialization 
before using the TZCNT instruction or inconsistent behavior may result. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Opcode Description 

TZCNT  regi6, reg/mem16 F3 OF BC /r Count the number of trailing zeros in reg/mem16. 
TZCNT  reg32, reg/mem32 F3 OF BC /r Count the number of trailing zeros in reg/mem32. 
TZCNT  reg64, reg/mem64 F3 OF BC /r Count the number of trailing zeros in reg/mem64. 


Related Instructions 


ANDN, BEXTR, BLCI, BLCIC, BLCMSK, BLCS, BLSFILL, BLSI, BLSIC, BLSR, BLSMSK, BSF, 
BSR, LZCNT, POPCNT, TIMSKC, TZMSK 


rFLAGS Affected 
wip [vig |ac [vm | RF | NT | topL | OF | DF | iF | TF | SF | ZF | AF | PF | 
U U | MU 4} U 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


CF 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 
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Exceptions 
Mode 
Exception Virtual Cause of Exception 
Real; 8086 |Protected 
Stack, #SS Xx Xx Xx A memory address exceeded the stack segment limit or 


was non-canonical. 


A memory address exceeded a data segment limit or was 


X X X : 
General protection, #GP non-canonical. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, #AC Xx Xx An unaligned memory reference was performed while 


alignment checking was enabled. 
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TZMSK Mask From Trailing Zeros 


Finds the least significant one bit in the source operand, sets all bits below that bit to 1, clears all other 
bits to O (including the found bit) and writes the result to the destination. If the least significant bit of 
the source operand is 1, the destination is written with all zeros. 
This instruction has two operands: 

TZMSK dest, src 


In 64-bit mode, the operand size is determined by the value of XOP.W. If XOP.W is 1, the operand size 
is 64-bit; if XOP.W is 0, the operand size is 32-bit. In 32-bit mode, XOP.W is ignored. 16-bit operands 
are not supported. 


The destination (dest) is a general purpose register. 
The source operand (src) is a general purpose register or a memory operand. 


The TZMSK instruction effectively performs a bit-wise logical and of the negation of the source 
operand and the result of subtracting 1 from the source operand, and stores the result to the destination 
register: 


sub tmp1, src, 1 
not tmp2, src 
and dest, tmp1, tmp2 


The value of the carry flag of rFLAGs is generated by the sub pseudo-instruction and the remaining 
arithmetic flags are generated by the and pseudo-instruction. 


The TZMSK instruction is a TBM instruction. Support for this instruction is indicated by CPUID 
Fn8000_0001_ECX[TBM] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Encoding 


XOP RXB.map_select W.vvvv.L.pp Opcode 
TZMSK reg32, reg/mem32 8F RXB.09 0.dest.0.00 01/4 
TZMSK reg64, reg/mem64 8F RXB.09 1.dest.0.00 01/4 


Related Instructions 


ANDN, BEXTR, BLCFILL, BLCI, BLCIC, BLCMSK, BLCS, BLSFILL, BLSI, BLSIC, BLSR, 
BLSMSK, BSF, BSR, LZCNT, POPCNT, TIMSKC, TZCNT 
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rFLAGS Affected 

ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0) Mi|M|U}|UJM 

21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 | 9 8 | 7 6 | 4 | 2] 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Exception vue Cause of Exception 
p Real| 8086 [Protected p 
Xx Xx TBM instructions are only recognized in protected mode. 
TBM instructions are not supported, as indicated by 
Invalid opcode, #UD X | CPUID Fn8000_0001_ECX{TBM] = 0. 
X XOP.L is 1. 
A memory address exceeded the stack segment limit or 
Stack, #SS ‘ was non-canonical. 
Xx Amemory address exceeded a data segment limit or was 
General protection, #GP non-canonical. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
An unaligned memory reference was performed while 
AIGMIMENE CHEEK FAG x alignment checking was enabled. 
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UDO, UD1, UD2 Undefined Operation 


These opcodes generate an invalid opcode exception. Unlike other undefined opcodes that may be 
defined as legal instructions in the future, these opcodes are intended to stay undefined. On some 
AMD64 processor implementations, UD1 may report an invalid opcode exception regardless of 
whether fetching the ModRM byte could trigger a paging or segmentation exception. 


Mnemonic Opcode Description 
UDO OF FF Raise an invalid opcode exception 
UD1 OF BO /r Raise an invalid opcode exception 
UD2 OF OB Raise an invalid opcode exception. 


Related Instructions 
None 
rFLAGS Affected 


None 


Exceptions 


Virtual 
8086 /Protected Cause of Exception 


Was opcode, This instruction is not recognized. 
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WRFSBASE Write FS.base 
WRGSBASE Write GS.base 


Writes the base field of the FS or GS segment descriptor with the value contained in the register 
operand. When supported and enabled, these instructions can be executed at any processor privilege 
level. Instructions are only defined in 64-bit mode. The address written to the base field must be in 
canonical form or a #GP fault will occur. 


System software must set the FSGSBASE bit (bit 16) of CR4 to enable the WRFSBASE and 
WRGSBASE instructions. 


Support for this instruction is indicated by CPUID Fn0000_0007_EBX_x0[FSGSBASE] = 1. 


For more information on using the CPUID instruction, see the instruction reference page for the 
CPUID instruction on page 165. For a description of all feature flags related to instruction subset 
support, see Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. 


Mnemonic Opcode Description 


Copy the contents of the specified 32-bit general- 
WEB ASE Teg 32 peur Ar !2 purpose register to the lower 32 bits of FS.base. 


Copy the contents of the specified 64-bit general- 
WRESpASE Igo ESUr AE purpose register to FS.base. 


Copy the contents of the specified 32-bit general- 
WROSBASE tege2 F3 OF AE /3 purpose register to the lower 32 bits of GS.base. 


Copy the contents of the specified 64-bit general- 
WRESBASE leuod PoUrAE Ie purpose register to GS.base. 


Related Instructions 


RDFSBASE, RDGSBASE 


rFLAGS Affected 

None. 

Exceptions 

Exception [Legacy) Compatibility | 64-bit Cause of Exception 
Xx x Instruction is not valid in compatibility or legacy modes. 
#UD Instruction not supported as indicated by CPUID 
Xx Fn0000_0007_EBX_x0[FSGSBASE] = 0 or, if supported, 

not enabled in CR4. 

#GP Xx Attempt to write non-canonical address to segment base 
address. 
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XADD Exchange and Add 


Exchanges the contents of a register (second operand) with the contents of a register or memory 
location (first operand), computes the sum of the two values, and stores the result in the first operand 
location. 


The forms of the XADD instruction that write to memory support the LOCK prefix. For details about 
the LOCK prefix, see “Lock Prefix” on page 11. 


Mnemonic Opcode Description 


Exchange the contents of an 8-bit register with the 
XADD reg/mem8, reg8 OF CO /r contents of an 8-bit destination register or memory 
operand and load their sum into the destination. 


Exchange the contents of a 16-bit register with the 
XADD reg/mem16, reg16 OF C1 /r contents of a 16-bit destination register or memory 
operand and load their sum into the destination. 


Exchange the contents of a 32-bit register with the 
XADD reg/mem32, reg32 OF C1 /r contents of a 32-bit destination register or memory 
operand and load their sum into the destination. 


Exchange the contents of a 64-bit register with the 
XADD reg/mem64, reg64 OF C1 /r contents of a 64-bit destination register or memory 
operand and load their sum into the destination. 


Related Instructions 


None 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M M M M M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 
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Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 

A memory address exceeded the stack segment limit or was 

Stack, #SS ‘. ‘ . non-canonical. 
Amemory address exceeded a data segment limit or was non- 

X X xX 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
x A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 


#AC 


alignment checking was enabled. 
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XCHG Exchange 


Exchanges the contents of the two operands. The operands can be two general-purpose registers or a 
register and a memory location. If either operand references memory, the processor locks 
automatically, whether or not the LOCK prefix is used and independently of the value of IOPL. For 
details about the LOCK prefix, see “Lock Prefix” on page 11. 


The x86 architecture commonly uses the XCHG EAX, EAX instruction (opcode 90h) as a one-byte 
NOP. In 64-bit mode, the processor treats opcode 90h as a true NOP only if it would exchange rAX 
with itself. Without this special handling, the instruction would zero-extend the upper 32 bits of RAX, 
and thus it would not be a true no-operation. Opcode 90h can still be used to exchange rAX and 18 if 
the appropriate REX prefix is used. 


This special handling does not apply to the two-byte ModRM form of the XCHG instruction. 


Mnemonic Opcode Description 


Exchange the contents of the AX register with the 


ee ede ats cd contents of a 16-bit register. 

XCHG reg16, AX oo ee Se eee 
XCHG EAX,reg92 ae Ee eect 
XCHG reg92, EAX oe ee ee 
XCHG RAX, re964 oo een 
XCHG regod, RAX ee 
XCHG reg/mems, rege — ee ee 
XCHG reg8, reg/mem8 86 /r Exchange the contents of an 8-bit register or memory 


operand with the contents of an 8-bit register. 


Exchange the contents of a 16-bit register with the 


ATMS FeuneTn EO eG tG ely contents of a 16-bit register or memory operand. 

XCHG regi6, regimemis «a7 /r_-—«—=—_«ExGhange the contents ofa 46-bit register or memory 
XCHG regimema2, regae2 7 «Exchange the contents of a 32-bit register wi he 
XCHG regi, regimema -«a7/r_—«—=—=«ExGhange the contents of 82-bit regstr or memory 
XCHG regimemé4, rege a7 /r_———=«Exchange the contents of 64-bit register wth the 
XCHG reg64, reg/mem64 87 /r Exchange the contents of a 64-bit register or memory 


operand with the contents of a 64-bit register. 
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Related Instructions 


AMD64 Technology 


BSWAP, XADD 
rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS ‘. ‘ . non-canonical. 
Xx Xx Xx Amemory address exceeded a data segment limit or was non- 
canonical. 
ou Protection: Xx The source or destination operand was in a non-writable 
segment. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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XLAT Translate Table Index 
XLATB 


Uses the unsigned integer in the AL register as an offset into a table and copies the contents of the table 
entry at that location to the AL register. 


The instruction uses seg:[rBX] as the base address of the table. The value of seg defaults to the DS 
segment, but may be overridden by a segment prefix. 


This instruction writes AL without changing RAX[63:8]. This instruction ignores operand size. 


The single-operand form of the XLAT instruction uses the operand to document the segment and 
address size attribute, but it uses the base address specified by the rBX register. 


This instruction is often used to translate data from one format (such as ASCII) to another (such as 
EBCDIC),. 


Mnemonic Opcode Description 
XLAT mem8& D7 Set AL to the contents of DS:[rBX + unsigned AL]. 
XLATB D7 Set AL to the contents of DS:[rBX + unsigned AL]. 


Related Instructions 


None 

rFLAGS Affected 
None 

Exceptions 

Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #SS “ “s ‘“ non-canonical. 
Xx Xx Xx Amemory address exceeded a data segment limit or was non- 

General protection, canonical. 

#GP 

Xx A null data segment was used to reference memory. 
Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
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XOR 


AMD64 Technology 


Logical Exclusive OR 


Performs a bit-wise logical xor operation on both operands and stores the result in the first operand 
location. The first operand can be a register or memory location. The second operand can be an 
immediate value, a register, or a memory location. XOR-ing a register with itself clears the register. 


The forms of the XOR instruction that write to memory support the LOCK prefix. For details about the 


LOCK prefix, see “Lock Prefix” on page 11. 


The instruction performs the following operation for each bit: 


rPlrF}|OoO;] Oy x 


rE|ol|r]| of < 


Mnemonic 


XOR AL, imm8& 


XOR AX, imm16 


XOR EAX, imm32 


XOR RAX, imm32 


XOR reg/mem8s, imm8s 


XOR reg/mem16, imm16 


XOR reg/mem32, imm32 


XOR reg/mem64, imm32 


XOR reg/mem16, imm8 


Opcode 


34 ib 


35 iw 


35 id 


35 id 


80 /6 ib 


81 /6 iw 


81 /6 id 


81 /6 id 


83 /6 ib 


Description 


XOr the contents of AL with an immediate 8-bit 
operand and store the result in AL. 


XOr the contents of AX with an immediate 16-bit 
operand and store the result in AX. 


XOr the contents of EAX with an immediate 32-bit 
operand and store the result in EAX. 


XOr the contents of RAX with a sign-extended 
immediate 32-bit operand and store the result in RAX. 


XOr the contents of an 8-bit destination register or 
memory operand with an 8-bit immediate value and 
store the result in the destination. 


XOr the contents of a 16-bit destination register or 
memory operand with a 16-bit immediate value and 
store the result in the destination. 


XOr the contents of a 32-bit destination register or 
memory operand with a 32-bit immediate value and 
store the result in the destination. 


XOr the contents of a 64-bit destination register or 
memory operand with a sign-extended 32-bit immediate 
value and store the result in the destination. 


XOr the contents of a 16-bit destination register or 
memory operand with a sign-extended 8-bit immediate 
value and store the result in the destination. 
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Mnemonic Opcode Description 


XOF the contents of a 32-bit destination register or 
XOR reg/mem32, immés 83 /6 ib memory operand with a sign-extended 8-bit immediate 
value and store the result in the destination. 


XOF the contents of a 64-bit destination register or 
XOR reg/mem64, immés 83 /6 ib memory operand with a sign-extended 8-bit immediate 
value and store the result in the destination. 


XOr the contents of an 8-bit destination register or 
XOR reg/mems, reg8 30 /r memory operand with the contents of an 8-bit register 
and store the result in the destination. 


XOr the contents of a 16-bit destination register or 
XOR reg/mem16, reg16 31/r memory operand with the contents of a 16-bit register 
and store the result in the destination. 


XOr the contents of a 32-bit destination register or 
XOR reg/mem32, reg32 31 /r memory operand with the contents of a 32-bit register 
and store the result in the destination. 


XOr the contents of a 64-bit destination register or 
XOR reg/mem64, reg64 31 /r memory operand with the contents of a 64-bit register 
and store the result in the destination. 


XOr the contents of an 8-bit destination register with 
XOR reg8, reg/mems 32 /r the contents of an 8-bit register or memory operand and 

store the results in the destination. 

XOr the contents of a 16-bit destination register with 
XOR reg16, reg/mem16 33 /r the contents of a 16-bit register or memory operand and 

store the results in the destination. 

XOr the contents of a 32-bit destination register with 
XOR reg32, reg/mem32 33 /r the contents of a 32-bit register or memory operand and 

store the results in the destination. 

XOr the contents of a 64-bit destination register with 


XOR reg64, reg/mem64 33 /r the contents of a 64-bit register or memory operand and 
store the results in the destination. 


Related Instructions 


OR, AND, NOT, NEG 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 


0 M M U M 0 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 
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Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Stack, #SS Xx Xx Xx A memory address exceeded the stack segment limit or was 


non-canonical. 


Amemory address exceeded a data segment limit or was non- 


x a x canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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4 System Instruction Reference 


This chapter describes the function, mnemonic syntax, opcodes, affected flags, and possible 
exceptions generated by the system instructions. System instructions are used to establish the 
processor operating mode, access processor resources, handle program and system errors, manage 
memory, and instantiate a virtual machine. Most of these instructions can only be executed by 
privileged software, such as the operating system or a Virtual Machine Monitor (VMM), also known 
as a hypervisor. Only system instructions can access certain processor resources, such as the control 
registers, model-specific registers, and debug registers. 


Most system instructions are supported in all hardware implementations of the AMD64 architecture. 
The table below lists instructions that may not be supported on a given processor implementation. 
System software must execute the CPUID instruction using the function number listed to determine 
support prior to using these instructions. 


Table 4-1. System Instruction Support Indicated by CPUID Feature Bits 


Instruction CPUID Feature Bit Register[Bit] 
CET_SS 0000_0007_0 ECX{7] 
CLAC, STAC 0000_0007_0 EBX[20] 
ae ne and Long Mode | g999_ 0001. EDX[LM] EDX[29] 
INVPCID 0000_0007_0 EBX[10] 
INVLPGB, TLBSYNC 8000_0008_EBX[INVLPGB] EBX[3] 
MONITOR, MWAIT 0000_0001_ECX[MONITOR] ECX{[3] 
RDPKRU, WRPKRU 0000_0007_0 ECX{4] 
So aaa a TE |8000_001F_EAX[SNP] EAX[4] 
RDMSR, WRMSR 0000_0001_EDx[MSRj EDX(5] 
RDTSCP 8000_0001_EDX[RDTSCP] EDX[27] 
SKINIT, STGI 8000_0001_ECX[SKINIT] ECX[12] 
a acs ang 8000_0001_ECX[SVM] ECX([2] 
SYSCALL, SYSRET 8000_0001_EDx[SysCallSysRet] EDX(11] 
SYSENTER, SYSEXIT 0000_0001_EDx([SysEnterSysExit] EDx[11] 
VMGEXIT 8000_001F[SEV-ES] EAX(3] 
WBNOINVD 8000_0008_EBX[WBNOINVD] EBX[9] 


There are also several other CPUID feature bits that indicate support for certain paging functions, 
virtual-mode extensions, machine-check exceptions, advanced programmable interrupt control 
(APIC), memory-type range registers (MTRRs), etc. 
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For more information on using the CPUID instruction, see the reference page for the CPUID 
instruction on page 165. For a comprehensive list of all instruction support feature flags, see 
Appendix D, “Instruction Subsets and CPUID Feature Flags,” on page 587. For a comprehensive list 
of all defined CPUID feature numbers and return values, see Appendix E, “Obtaining Processor 
Information Via the CPUID Instruction,” on page 593. 

For further information about the system instructions and register resources, see: 

e “System Instructions” in Volume 2. 

e “Summary of Registers and Data Types” on page 38. 

e “Notation” on page 53. 


e “Instruction Prefixes” on page 5. 
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ARPL Adjust Requestor Privilege Level 


Compares the requestor privilege level (RPL) fields of two segment selectors in the source and 
destination operands of the instruction. If the RPL field of the destination operand is less than the RPL 
field of the segment selector in the source register, then the zero flag is set and the RPL field of the 
destination operand is increased to match that of the source operand. Otherwise, the destination 
operand remains unchanged and the zero flag is cleared. 


The destination operand can be either a 16-bit register or memory location; the source operand must be 
a 16-bit register. 


The ARPL instruction is intended for use by operating-system procedures to adjust the RPL of a 
segment selector that has been passed to the operating system by an application program to match the 
privilege level of the application program. The segment selector passed to the operating system is 
placed in the destination operand and the segment selector for the code segment of the application 
program is placed in the source operand. The RPL field in the source operand represents the privilege 
level of the application program. The ARPL instruction then insures that the RPL of the segment 
selector received by the operating system is no lower than the privilege level of the application 
program. 


See “Adjusting Access Rights” in Volume 2, for more information on access rights. 


In 64-bit mode, this opcode (63H) is used for the MOVSXD instruction. 


Mnemonic Opcode Description 


Adjust the RPL of a destination segment selector to 
a level not less than the RPL of the segment 
selector specified in the 16-bit source register. 
(Invalid in 64-bit mode.) 


ARPL reg/mem16, regi6 63 /r 


Related Instructions 
LAR, LSL, VERR, VERW 
rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to one or cleared to zero is M (modified). Unaffected flags 
are blank. Undefined flags are U. 
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Exceptions 
Virtual 
Exception Real} 8086 |Protected Cause of Exception 
Invalid opcode, #UD Xx Xx This instruction is only recognized in protected legacy and 


compatibility mode. 


Stack, #SS 


A memory address exceeded the stack segment limit. 


General protection, 
#GP 


A memory address exceeded a data segment limit. 


The destination operand was in a non-writable segment. 


A null segment selector was used to reference memory. 


Page fault, #PF 


A page fault resulted from the execution of the instruction. 


Alignment check, #AC 


x | KI) KK) x] XxX 


An unaligned memory reference was performed while 
alignment checking was enabled. 
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CLAC 


Sets the Alignment Check flag in the rFLAGS register to zero. Support for the CLAC instruction is 
indicated by CPUID Fn07_EBX[20] = 1. For more information on using the CPUID instruction, see 
the description of the CPUID instruction on page 165. 


AMD64 Technology 


Clear Alignment Check Flag 


Mnemonic Opcode Description 

CLAC OF 01 CA Clear AC Flag 

Related Instructions 

STAC 

rFLAGS Affected 

ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF |) SF | ZF | AF | PF | CF 
0 

21 ; 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


blank. Undefined flags are U. 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to O is M (modified). Unaffected flags are 


Exceptions 


Exception 


Invalid opcode, #UD 


Virtual 
Real} 8086 [Protected Cause of Exception 
Xx Xx Xx Instruction not supported by CPUID 
Xx x Instruction is not supported in virtual mode 
x x Lock prefix (FOh) preceding opcode. 
xX CPL was not 0 
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Clear Global Interrupt Flag 


Clears the global interrupt flag (GIF). While GIF is zero, all external interrupts are disabled. 


This is a Secure Virtual Machine instruction. Support for the SVM architecture and the SVM 
instructions is indicated by CPUID Fn8000_0001_ECX[SVM] = 1. For more information on using the 
CPUID instruction, see the reference page for the CPUID instruction on page 165. 


This instruction generates a #UD exception if SVM is not enabled. See “Enabling SVM” in AMD64 
Architecture Programmer’s Manual Volume-2: System Instructions, order# 24593. 


Mnemonic Opcode 
CLGI OF 01 DD 


Related Instructions 


Description 


Clears the global interrupt flag (GIF). 


STGI 
rFLAGS Affected 
None. 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Xx Xx Xx The SVM instructions are not supported as indicated by 
CPUID Fn8000_0001_ECX[SVM] = 0. 
Invalid opcade, #UD Xx Secure Virtual Machine was not enabled (EFER.SVME=0). 
Xx x Instruction is only recognized in protected mode. 
General protection, 
#GP Xx CPL was not 0. 
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CLI Clear Interrupt Flag 


Clears the interrupt flag (IF) in the rFLAGS register to zero, thereby masking external interrupts 
received on the INTR input. Interrupts received on the non-maskable interrupt (NMI) input are not 
affected by this instruction. 


In real mode, this instruction clears IF to 0. 


In protected mode and virtual-8086-mode, this instruction is IOPL-sensitive. If the CPL is less than or 
equal to the rFLAGS.IOPL field, the instruction clears IF to 0. 


In protected mode, if IOPL < 3, CPL = 3, and protected mode virtual interrupts are enabled (CR4.PVI 
= 1), then the instruction instead clears rFLAGS.VIF to 0. If none of these conditions apply, the 
processor raises a general-purpose exception (#GP). For more information, see “Protected Mode 
Virtual Interrupts” in Volume 2. 


In virtual-8086 mode, if IOPL < 3 and the virtual-8086-mode extensions are enabled (CR4.VME = 1), 
the CLI instruction clears the virtual interrupt flag (rFLAGS.VIF) to 0 instead. 


See “Virtual-8086 Mode Extensions” in Volume 2 for more information about IOPL-sensitive 
instructions. 


Mnemonic Opcode Description 
CLI FA Clear the interrupt flag (IF) to zero. 
Action 


IF (CPL <= IOPL) 
RFLAGS.IF = 0 


ELSEIF (((VIRTUAL_MODE) && (CR4.VME == 1)) 
|| ((PROTECTED_MODE) && (CR4.PVI == 1) && (CPL == 3))) 
RFLAGS.VIF = 0; 


ELSE 
EXCEPTION[#GP(0) ] 


Related Instructions 


STI 
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rFLAGS Affected 

ID | VIP | VIF ; AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M 

21 | 20 | 19 ; 18 | 17 | 16 |} 14 13:12 11 | 10 9 8 7 6 4 2 0 


blank. Undefined flags are U. 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to one or cleared to zero is M (modified). Unaffected flags are 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Xx The CPL was greater than the IOPL and virtual mode 


#GP 


General protection, 


extensions are not enabled (CR4.VME = 0). 


(CR4.PVI = 0). 


The CPL was greater than the IOPL and either the CPL was 
not 3 or protected mode virtual interrupts were not enabled 
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CLTS Clear Task-Switched Flag in CRO 


Clears the task-switched (TS) flag in the CRO register to 0. The processor sets the TS flag on each task 
switch. The CLTS instruction is intended to facilitate the synchronization of FPU context saves during 
multitasking operations. 


This instruction can only be used if the current privilege level is 0. 
See “System-Control Registers” in Volume 2 for more information on FPU synchronization and the 


TS flag. 


Mnemonic Opcode Description 
CLTS OF 06 Clear the task-switched (TS) flag in CRO to 0. 


Related Instructions 
LMSW, MOV CRn 


rFLAGS Affected 


None 


Exceptions 


Virtual 
8086 /Protected Cause of Exception 


pera protection, CPL was not 0. 
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CLRSSBSY Clear Shadow Stack Busy 


Validates the busy (in use) shadow stack token pointed to by the memory operand and clears the tokens 
busy bit. If the token validation checks pass, CF is cleared to 0 and SSP is cleared to 0. If the token 
validation checks fail, CF is set to 1 and the token and SSP are not modified. 


CLRSSBY is a privileged instruction and must be executed with CPL=0, otherwise a #GP exception is 
generated. If shadow stacks are not enabled at the supervisor level, a #UD exception is generated. 


Mnemonic Opcode Description 
CLRSSBSY mem64 F3 OF AE /6 Validate shadow stack token and clear busy bit. 
Actions 


// see "Pseudocode Definition" on page 57 


IF (CR4.CET == 0) 
EXCEPTION [#UD] 

IF (S_CET.SH_STK_EN == 0) 
EXCEPTION [#UD] 

IF (CPL != 0) 
EXCEPTION [#GP(0)] 


temp_linAdr = Linear_Address(mem64) 


IF (temp_linAdr is not 8-byte aligned) 
EXCEPTION [#GP(0) ] 


bool INVALID_TOKEN = FALSE 


< start atomic section > 
temp_Token = SSTK_READ_MEM.q [temp_linAdr] // fetch token with locked read 


IF ((temp_Token AND 0x01) != 1) 
INVALID_TOKEN = TRUE // token busy bit must be set 


IF ((temp_Token AND ~0x01) != temp_linAdr) 
INVALID_TOKEN = TRUE // address in token must equal 
// linear address of mem64 
IF (!INVALID_TOKEN) 
temp_Token = temp_Token AND ~0x01 // walid token, clear busy bit 


SSTK_WRITE_MEM.q[temp_linAdr] = temp_Token // write back token and unlock 
< end atomic section > 


RFLAGS.ZF,PF,AF,OF,SF = 0 
IF (INVALID_TOKEN) 


RFLAGS.CF = 1 // set CF if token not valid 
ELSE 
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RFLAGS.CF = 0 // else clear CF 
// and set SSP = 0 


SSP = 0 


i 
EXIT 


Related Instructions 


AMD64 Technology 


SETSSBSY 
rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0 0) 0) ) ) M 
21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10/ 9 8 7 6 4 2 ) 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Exception (vector) 


Invalid opcode, #UD 


Real 


Virtual 
8086 


Protected 


Cause of Exception 
Instruction is only recognized in protected mode. 


CR4.CET = 0 


Shadow stacks not enabled at supervisor level 


General protection, 
#GP 


CPL!=0 


The linear address is not 8-byte aligned. 


A memory address exceeded a data segment limit. 


In long mode, the address of the memory operand was 
non-canonical. 


A null data segment was used to reference memory. 


A non-writable data segment was used. 


«x | KY) KY) XK | XK) KY) KY) O&K] x 


An execute-only code segment was used to reference 
memory. 


Page fault, #PF 


The linear address is not a supervisor shadow stack 
page in the OS page tables. 


A page fault resulted from the execution of the 
instruction. 
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HLT Halt 


Causes the microprocessor to halt instruction execution and enter the HALT state. Entering the HALT 
state puts the processor in low-power mode. Execution resumes when an unmasked hardware interrupt 
(INTR), non-maskable interrupt (NMI), system management interrupt (SMI), RESET, or INIT occurs. 


If an INTR, NMI, or SMI is used to resume execution after a HLT instruction, the saved instruction 
pointer points to the instruction following the HLT instruction. 


Before executing a HLT instruction, hardware interrupts should be enabled. If rFLAGS.IF = 0, the 
system will remain in a HALT state until an NMI, SMI, RESET, or INIT occurs. 


If an SMI brings the processor out of the HALT state, the SMI handler can decide whether to return to 
the HALT state or not. See Volume 2: System Programming, for information on SMIs. 


Current privilege level must be 0 to execute this instruction. 
Mnemonic Opcode Description 


HLT F4 Halt instruction execution. 


Related Instructions 


STL CLI 


rFLAGS Affected 


None 


Exceptions 


Virtual 
8086 /Protected Cause of Exception 


ere protection, CPL was not 0. 
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INCSSP Increment Shadow Stack Pointer 


Increments SSP by the operand size of the instruction multiplied by the unsigned 8-bit value in bits 
[7:0] of the register operand. The operand size is 8 bytes in 64-bit mode (when REX.W = 1) and is 4 
bytes in all other cases. 


Before incrementing SSP, the first and last elements of the shadow stack in the range specified by the 
register operand are read and discarded. 


Mnemonic Opcode Description 
INCSSPD reg32 F3 OF AE /05 Increment SSP by 4*(reg32[7:0]). 
INCSSPQ reg64 F3 OF AE /05 Increment SSP by 8*(reg64[7:0]). 
Action 


IF ((CPL == 3) && (!SSTK_USER_ENABLED) ) 
EXCEPTION [#UD] 

ELSEIF ((CPL < 3) && (!SSTK_SUPV_ENABLED) ) 
EXCEPTION [#UD] 


IF (OPERAND_SIZE == 64) 


{ 
temp_numItems = (reg64[7:0] == 0) ? 1: reg64[7:0] 
temp = SSTK_READ_MEM.q [SSP] // touch TOS and last 


temp = SSTK_READ_MEM.q [SSP + temp_numItems*8 - 8] // element in range 
SSP = SSP + reg64[7:0]*8 // increment SSP 


} 
ELSE 
{ 
temp_numItems = (reg32[7:0] == 0) ? 1 : reg32[7:0] 
temp = SSTK_READ_MEM.d [SSP] // touch TOS and last 


temp = SSTK_READ_MEM.d [SSP + temp_numItems*4 - 4] // element in range 
SSP = SSP + reg32[7:0]*4 // increment SSP 


} 
EXIT 
Related Instructions 
RDSSP, RSTORSSP 
rFLAGS Affected 


None 
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Exceptions 
Virtual 
Exception Real| 8086 | Protected Cause of Exception 
Xx Xx Instruction is only recognized in protected mode. 
Invalid opcode, #UD “i peeve at 
Xx Shadow stacks are not enabled at the current privilege 
level. 
Xx A page fault occurred when touching the first or last 
element of the shadow stack in the range specified. 
Page fault, #PF : er ; 
Xx The first or last element in the range specified is not in 
a shadow stack page. 
Xx In long mode, the address of the memory operand was 
non-canonical. 
General protection, #GP Xx A memory address exceeded a data segment limit. 
Xx A null data segment was used to reference memory. 
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INT 3 Interrupt to Debug Vector 


Calls the debug exception handler. This instruction maps to a 1-byte opcode (CC) that raises a #BP 
exception. The INT 3 instruction is normally used by debug software to set instruction breakpoints by 
replacing the first byte of the instruction opcode bytes with the INT 3 opcode. 


This one-byte INT 3 instruction behaves differently from the two-byte INT 3 instruction (opcode CD 
03) (see “INT” in Chapter 3 “General Purpose Instructions” for further information) in two ways: 


The #BP exception is handled without any IOPL checking in virtual x86 mode. (IOPL mismatches 
will not trigger an exception.) 


¢ In VME mode, the #BP exception is not redirected via the interrupt redirection table. (Instead, it is 
handled by a protected mode handler.) 


Mnemonic Opcode Description 


INT 3 CC Trap to debugger at Interrupt 3. 


For complete descriptions of the steps performed by INT instructions, see the following: 


e Legacy-Mode Interrupts: “Legacy Protected-Mode Interrupt Control Transfers” in Volume 2. 
¢ Long-Mode Interrupts: “Long-Mode Interrupt Control Transfers” in Volume 2. 

Action 

// Refer to INT instruction’s Action section for the details on INT_N_REAL, 
// INT_N_PROTECTED, and INT_N_VIRTUAL_TO_PROTECTED. 

INT3_START: 


If (REAL_MODE) 
INT_N_REAL //N = 3 


ELSEIF (PROTECTED_MODE ) 
INT_N_PROTECTED //N = 3 


ELSE // VIRTUAL_MODE 
INT_N_VIRTUAL_TO_PROTECTED //N 


ll 
wo 


Related Instructions 


INT, INTO, IRET 
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If a task switch occurs, all flags are modified; otherwise, setting are as follows: 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M 0 0 M M 0 
21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to one or cleared to zero is M (modified). Unaffected flags 
are blank. Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Breakpoint, #BP Xx Xx Xx INT 3 instruction was executed. 
Xx Xx As part of a stack switch, the target stack segment selector or 
rSP in the TSS that was beyond the TSS limit. 
As part of a stack switch, the target stack segment selector in 
Xx Xx the TSS was beyond the limit of the GDT or LDT descriptor 
table. 
Xx Xx As part of a stack switch, the target stack segment selector in 
the TSS was a null selector. 
Invalid TSS, #TS Xx Xx As part of a stack switch, the target stack segment selector’s 
(selector) TI bit was set, but the LDT selector was a null selector. 
Xx Xx As part of a stack switch, the target stack segment selector in 
the TSS contained a RPL that was not equal to its DPL. 
As part of a stack switch, the target stack segment selector in 
Xx Xx the TSS contained a DPL that was not equal to the CPL of the 
code segment selector. 
Xx Xx As part of a stack switch, the target stack segment selector in 
the TSS was not a writable segment. 
se lle Xx x The accessed code segment, interrupt gate, trap gate, task 
feclecion gate, or TSS was not present. 
A memory address exceeded the stack segment limit or was 
Stack, #SS ‘ n - non-canonical. 
After a stack switch, a memory address exceeded the stack 
Xx Xx segment limit or was non-canonical and a stack switch 
Stack, #SS occurred. 
(selector) As part of a stack switch, the SS register was loaded with a 
Xx Xx non-null segment selector and the segment was marked not 
present. 
Xx Xx Xx A memory address exceeded the data segment limit or was 
General protection, non-canonical. 
#GP x x Xx The target offset exceeded the code segment limit or was non- 
canonical. 
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#AC 


Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Xx Xx x The interrupt vector was beyond the limit of IDT. 
The descriptor in the IDT was not an interrupt, trap, or task 
Xx Xx gate in legacy mode or not a 64-bit interrupt or trap gate in 
long mode. 
Xx Xx The DPL of the interrupt, trap, or task gate descriptor was less 
than the CPL. 
Xx Xx The segment selector specified by the interrupt or trap gate 
General protection, had its TI bit set, but the LDT selector was a null selector. 
#GP Xx Xx The segment descriptor specified by the interrupt or trap gate 
(selector) exceeded the descriptor table limit or was a null selector. 
The segment descriptor specified by the interrupt or trap gate 
Xx Xx was not a code segment in legacy mode, or not a 64-bit code 
segment in long mode. 
Xx The DPL of the segment specified by the interrupt or trap gate 
was greater than the CPL. 
Xx The DPL of the segment specified by the interrupt or trap gate 
pointed was not O or it was a conforming segment. 
Page fault, #PF x x A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 


alignment checking was enabled. 
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INVD Invalidate Caches 


Invalidates all levels of cache associated with this processor. This may or may not include lower level 
caches associated with another processor that shares any level of this processor's cache hierarchy. 


No data is written back to main memory from invalidating the caches. 


CPUID Fn8000_001D_EDX[WBINVD]_XN indicates the behavior of the processor at various levels 
of the cache hierarchy. If the feature bit is 0, the instruction causes the invalidation of all lower level 
caches of other processors sharing the designated level of cache. If the feature bit is 1, the instruction 
does not necessarily cause the invalidation of all lower level caches of other processors sharing the 
designated level of cache. See Appendix E, “Obtaining Processor Information Via the CPUID 
Instruction,” on page 593 for more information on using the CPUID function. 


This is a privileged instruction. The current privilege level (CPL) of a procedure invalidating the 
processor’s internal caches must be 0. 


To insure that data is written back to memory prior to invalidating caches, use the WBINVD 
instruction. 


This instruction does not invalidate TLB caches. 


INVD is a serializing instruction. 


Mnemonic Opcode Description 


Invalidate internal caches and trigger external cache 


INVD OF 08 invalidations. 


Related Instructions 


WBINVD, WBNOINVD, CLWB, CLFLUSH 


rFLAGS Affected 


None 


Exceptions 


Virtual 
8086 |Protected Cause of Exception 


pol protection, CPL was hota. 
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INVLPG invalidate TLB Entry 


Invalidates the TLB entry that would be used for the 1-byte memory operand. 


This instruction invalidates the TLB entry, regardless of the G (Global) bit setting in the associated 
PDE or PTE entry and regardless of the page size (4 Kbytes, 2 Mbytes, 4 Mbytes, or 1 Gbyte). It may 
invalidate any number of additional TLB entries, in addition to the targeted entry. INVLPG only 
invalidates TLB entries tagged with the current PCID and also global pages regardless of PCIDs. If 
PCIDs are disabled (CR4.PCID=0) then the current PCID is zero. 


INVLPG is a serializing instruction and a privileged instruction. The current privilege level must be 0 
to execute this instruction. 


See “Page Translation and Protection” in Volume 2 for more information on page translation. 


Mnemonic Opcode Description 


Invalidate the TLB entry for the page containing a specified 
INVLPG mem8 OF 01 /7 memory location. 


Related Instructions 


INVLPGA, INVLPGB, INVPCID, MOV CRn (CR3 and CR4) 


rFLAGS Affected 


None 


Exceptions 


Virtual 
8086 /Protected Cause of Exception 


coed protection, CPL was note: 
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INVLPGA invalidate TLB Entry in a Specified ASID 


Invalidates the TLB mapping for a given virtual page and a given ASID. The virtual (linear) address is 
specified in the implicit register operand rAX. The portion of rAX used to form the address is 
determined by the effective address size (current execution mode and optional address size prefix). 
The ASID is taken from ECX. 


The INVLPGA instruction may invalidate any number of additional TLB entries, in addition to the 
targeted entry. 


The INVLPGA instruction is a serializing instruction and a privileged instruction. The current 
privilege level must be 0 to execute this instruction. 


This is a Secure Virtual Machine (SVM) instruction. Support for the SVM architecture and the SVM 
instructions is indicated by CPUID Fn8000_0001_ECX[SVM] = 1. For more information on using the 
CPUID instruction, see the reference page for the CPUID instruction on page 165. 


This instruction generates a #UD exception if SVM is not enabled. See “Enabling SVM” in AMD64 
Architecture Programmer’s Manual Volume-2: System Instructions, order# 24593. 


Mnemonic Opcode Description 


Invalidates the TLB mapping for the virtual page 
NTR OR TA ECR OP OLDE specified in rAX and the ASID specified in ECX. 


Related Instructions 


INVLPG, INVLPGB, INVPCID 


rFLAGS Affected 
None. 
Exceptions 
Virtual 
Exception Real} 8086 |Protected Cause of Exception 
Xx Xx Xx The SVM instructions are not supported as indicated by 
CPUID Fn8000_0001_ECX[SVM] = 0. 
Invalid opcode, #UD Xx Secure Virtual Machine was not enabled (EFER.SVME=0). 
Xx Xx Instruction is only recognized in protected mode. 
General protection, 
#GP Xx CPL was not 0. 
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INVLPGB Invalidate TLB Entry(s) with Broadcast 


Invalidates the TLB entry or entries specified by the descriptor in the rAX:EDX register pair. 
Invalidations are done both in the local TLB and broadcast to all processors to perform the same 
invalidations. The virtual (linear) address is specified in the implicit register operand rAX. The portion 
of rAX used to form the address is determined by the effective address size. 


The TLB control field is specified in rAX[5:0]. It determines which components of the address (VA, 
PCID, ASID) are valid for comparison in the TLB and whether to include global entries in the 
invalidation process. If rAX[4] is set, only the final translation is invalidated and not the cached upper 
level TLB entries that lead to the final page. This ability may not be possible with all processors in 
which case the bit is ignored. If rAX[5] is set, all nested translations that could be used for guest 
translation selected in rAX[4:0] are flushed. rAX[5] can only be set if CPUID Fn8000_0008_EBX[21 
=1. ECX provides a count of the number of pages to include in invalidation with the specified address 
and the page size at which to increment the specified address. 


The descriptor in rAX has the following format: 


rAX Attributes 
0 Valid VA 
1 Valid PCID 
2 Valid ASID 
3 Include Global 
4 Final Translation Only 
5 Include Nested Translations 

11:6 Reserved, MBZ 

63:12 or 31:12 VA 


rAX[3:0] provides for various types of invalidations. A few examples are listed in the following table, 
but all values are legal. 


rAX [3:0] Action 
Invalidate all TLB entries that match {ASID, PCID, VA} 
OxF : 
including Global 
Invalidate all TLB entries that match {ASID} including 
OxC 
Global 
OxD Invalidate all TLB entries that match {ASID, VA} including 
Global 
Ox Invalidate all TLB entries that match {ASID} excluding 
Global 
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rAX [3:0] Action 
OxE Invalidate all TLB entries that match {ASID, PCID} 


including Global 


Invalidate all TLB entries that match {ASID, PCID} 
excluding Global 


Ox6 


The descriptor in EDX has the following format: 


EDX Attributes 
15:0 ASID 

27:16 PCID 

31:28 Reserved, MBZ 


ECX[15:0] contains a count of the number of sequential pages to invalidate in addition to the original 
virtual address, starting from the virtual address specified in rAX. A count of 0 invalidates a single 
page. ECX[31]=0 indicates to increment the virtual address at the 4K boundary. ECX[31]=1 indicates 
to increment the virtual address at the 2M boundary. The maximum count supported is reported in 
CPUID function 8000_0008h, EDX[15:0]. 


This instruction invalidates the TLB entry or entries, regardless of the page size (4 Kbytes, 2 Mbytes, 4 
Mbytes, or 1 Gbyte). It may invalidate any number of additional TLB entries in addition to the targeted 
entry or entries to accomplish the specified function. INVLPGB follows the same rules for cached 
upper TLB entries as INVLPG which is controlled by EFER.TCE. However, since this is a broadcast, 
the invalidation is controlled by the EFER.TCE value on the processor executing the INVLPGB 
instruction. (See Section 3, “Translation Cache Extension” in AMD64 Architecture Programmer’s 
Manual Volume 2 for more information on EFER.TCE.) 


Under the following circumstances, execution of INVLPGB will result in a General Protection fault 

(#GP): 

¢ If SVM is disabled, requesting the ASID field with any value but zero, even if the ASID is not 
necessary for the flush. 


e If PCID is disabled, requesting the PCID field with any value but zero, even if the PCID is not 
necessary for the flush. 


e Ifthe request exceeds the number of valid ASIDs for the processor, even if the ASID is not valid. 
e Attempts to request a count larger than the maximum count supported, even if the VA is not valid 
e Attempts to execute an INVLPGB while in 4M paging mode. 


Guest Usage of INVLPGB. Guest usage of INVLPGB is supported only when the instruction has 
been explicitly enabled by the hypervisor in the VMCB (see APM Volume 2 Appendix B, Table B-1: 
VMCB Layout, Control Area). Support for INVLPGB/TLBSYNC hypervisor enable in VMCB is 
indicated by CPUID Fn8000_000A_EDX[24] = 1. 
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A guest that executes a legal INVLPGB that is not intercepted will have the requested ASID field 
replaced by the current ASID and the valid ASID bit set before doing the broadcast invalidation. 
Because of its broadcast nature, the ASID field must be global and all processors must allocate the 
same ASID to the same Guest for proper operation. Hypervisors that do not support a global ASID 
must intercept the Guest usage of INVLPGB, if enabled, for proper behavior. 


Two forms of INVLPGB intercepts, conditional and unconditional, are available to the hypervisor. 
The unconditional intercept traps all guest usage of INVLPGB. The conditional intercept traps only 
illegally-specified INVLPGB instructions. An illegally specified INVLPGB is one that would, if not 
intercepted, cause a #GP for any reason other than not being executed at CPL 0. 


INVLPGB is a privileged instruction but not a serializing instruction. It must be executed at CPL 0, but 
will broadcast the invalidate to the rest of the processors which may be running at any privilege level. 


INVLPGB is weakly ordered as it broadcasts the invalidation types throughout the system to all 
processors, so that a batch of invalidations can be done in a parallel fashion. For software to guarantee 
that all processors have seen and done the TLB invalidations, a TLBSYNC must be executed on the 
initiating processor. 


Mnemonic Opcode Description 


INVLPGB OF 01 FE Invalidates TLB entry(s) with Broadcast. 


Related Instructions 


TLBSYNC, INVLPG, INVLPGA, INVPCID 


rFLAGS Affected 
None. 
Exceptions 
Virtual 
Exception Real} 8086 |Protected Cause of Exception 
Xx Xx This instruction is only recognized in protected mode. 
x x x This instruction is not supported as indicated by CPUID 
Invalid opcode, #UD Fn8000_0008_EBX[INVLPGB] = 0. 
x The hypervisor has not enabled Guest usage of this 
instruction. 
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Virtual 
Exception Real} 8086 |Protected Cause of Exception 


xX CPL was not 0. 

Xx EAX[11:6] is not zero or EAX[5] not zero if not supported. 
EDX[31:28] is not zero. 

CR4.PCID =0 and EDX[PCID] is not zero. 

EFER.SVME =0 and EDX[ASID] is not zero. 

EDX[ASID] > number of supported ASIDs. 

ECX[15:0] > maximum page count supported. 


General protection, 
#GP 


| «| &K| KY] xX 


4M paging is active. 
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INVPCID invalidate TLB Entry(s) in a Specified PCID 


Invalidates the TLB entry or entries on the logical processor for a given PCID in the local TLB based 
on the operation type specified in the register operand and the PCID and virtual (linear) address 
specified by the descriptor in the memory operand. (See “Process Context Identifier” in Chapter 5 of 
the AMD64 Architecture Programmer’s Manual Volume 2 for more information on PCIDs.) 


The register operand is always 64 bits in 64-bit mode and 32 bits outside 64-bit mode regardless of 
value of CS.D. 


The operation type is specified in the register operand bits [1:0]. The operation type determines which 
components of the address (VA, PCID) are valid for comparison in the TLB and whether to include 
global valid bits in the invalidation process. 


The operation types are: 


reg32/64 [1:0] Action 
0 Invalidate TLB entries that match {PCID, VA} excluding 
Global 
1 Invalidate all TLB entries that match {PCID} excluding 
Global 


Invalidate all TLB entries including Global 


Invalidate all TLB entries excluding Global 


The descriptor in the memory operand is formatted as follows: 


127:64 63:12 11:0 
VA Reserved, MBZ PCID 


This instruction invalidates the TLB entry or entries, regardless of the page size (4 Kbytes, 2 Mbytes, 4 
Mbytes, or 1 Gbyte). It may invalidate any number of additional TLB entries, in addition to the 
targeted entry or entries to accomplish the specified function. INVPCID follows the same rules for 
cached upper TLB entries as INVLPG which is controlled by EFER.TCE. (See Section 3, “Translation 
Cache Extension” in AMD64 Architecture Programmer’s Manual Volume 2 for more information on 
EFER.TCE.) 


If PCID is disabled (CR4.PCID = 0), all TLB entries are being cached with PCID = 0. When 
CR4.PCID = 0, executing INVPCID with type 0 and 1 is only allowed if the PCID specified in the 
descriptor is zero. Furthermore, when CR4.PCID = 0, executing INVPCID with type 2 or 3 invalidate 
mappings only for PCID = 0. 


INVPCID is a serializing instruction and a privileged instruction. The current privilege level must be 0 
to execute this instruction. 
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Mnemonic 


Opcode 


INVPCID reg32, mem128 66 OF 38 82 /r 


INVPCID reg64, mem128 ‘66 OF 38 82 /r 


Related Instructions 


24594—Rev. 3.33—November 2021 


Description 


Invalidates the TLB entry(s) by PCID in r32 and 
descriptor in mem28. 


Invalidates the TLB entry(s) by PCID in r64 and 
descriptor in mem28. 


INVLPG, INVLPGA, INVLPGB, TLBSYNC 


rFLAGS Affected 


None. 


Exceptions 


Exception 


Invalid opcode, #UD 


Real 


Protected 


Cause of Exception 
This instruction is only recognized in protected mode. 


This instruction not supported as indicated by CPUID 
Fn0000_0007_EBX_xO[INVPCID] = 0. 


If mod=11 (register is specified instead of memory for desc). 


If the LOCK prefix is used. 


General protection, 
#GP 


CPL was not 0. 


An invalid type (>3) was specified in register operand. 


Bits 63:12 of descriptor in memory operand are not all zero. 


x | Ki) Ki) KY] KY) KY) Ox 


Invalidation type O was specified and the virtual address in 
bits 127:64 of descriptor is not canonical. 


x< 


Invalidation type O or 1 and bits 11:0 of descriptor are not 
zero when CR4.PCIDE = 0. 


An execute-only code segment was used to reference 
memory. 


A memory address exceeded a data segment limit. 


In long mode, the address of the memory operand was non- 
canonical. 


A null data segment was used to reference memory. 


Stack, #SS 


A memory address exceeded the stack segment limit or was 
non-canonical. 


Page Fault, #PF 


<x! K | KY) K— | KY) Ox 


A page fault resulted from the execution of the instruction. 


392 


[ AM D Py bli C U S e] System Instruction Reference 


AMDi@1 


24594—Rev. 3.33—November 2021 AMD64 Technology 
IRET Return from Interrupt 
IRETD 
IRETQ 


Returns program control from an exception or interrupt handler to a program or procedure previously 
interrupted by an exception, an external interrupt, or a software-generated interrupt. These instructions 
also perform a return from a nested task. All flags, CS, and rIP are restored to the values they had 
before the interrupt so that execution may continue at the next instruction following the interrupt or 
exception. In 64-bit mode or if the CPL changes, SS and RSP are also restored. 


IRET, IRETD, and IRETQ are synonyms mapping to the same opcode. They are intended to provide 
semantically distinct forms for various opcode sizes. The IRET instruction is used for 16-bit operand 
size; IRETD is used for 32-bit operand sizes; IRETQ is used for 64-bit operands. The latter form is 
only meaningful in 64-bit mode. 


IRET, IRETD, or IRETQ must be used to terminate the exception or interrupt handler associated with 
the exception, external interrupt, or software-generated interrupt. 


IRETx is a serializing instruction. 


For detailed descriptions of the steps performed by IRETx instructions, see the following: 


e Legacy-Mode Interrupts: “Legacy Protected-Mode Interrupt Control Transfers” in Volume 2. 
¢ Long-Mode Interrupts: “Long-Mode Interrupt Control Transfers” in Volume 2. 


Mnemonic Opcode Description 
IRET CF Return from interrupt (16-bit operand size). 
IRETD CF Return from interrupt (32-bit operand size). 
IRETQ CF Return from interrupt (64-bit operand size). 
Action 


// For functions READ_DESCRIPTOR, ShadowStacksEnabled 
// see "Pseudocode Definition" on page 57 


IRET_START: 


IF (REAL_MODE) 
IRET_REAL 


ELSIF (PROTECTED_MODE) 
IRET_PROTECTED 


ELSE // (VIRTUAL_MODE) 
TRET_VIRTUAL 
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IRET_REAL: 


POP.v temp_RIP 
POP.v temp_CS 
POP.v temp_RFLAGS 


IF (temp_RIP > CS.limit) 
EXCEPTION [#GP(0)] 


cS.sel = temp_CS 

CS.base = temp_CS SHL 4 

RFLAGS.v = temp_RFLAGS // VIF,VIP,VM unchanged 
RIP = temp_RIP 

EXIT 


IRET_PROTECTED: 


IF (RFLAGS.NT == 1) 
IF (LEGACY_MODE) // IRET does a task-switch to a previous task 
TASK_SWITCH // using the ‘’back link’ field in the TSS 
ELSE // (LONG_MODE) 
EXCEPTION [#GP(0) ] // task switches aren’t supported in long mode 


POP.v temp_RIP 
POP.v temp_CS 
POP.v temp_RFLAGS 


IF ((temp_RFLAGS.VM==1) && (CPL==0) && (LEGACY_MODE) ) 
IRET_FROM_PROTECTED_TO_VIRTUAL 


IF (temp_CS.rpl = CPL) 
changing_CPL = FALSE 
ELSEIF (temp_CS.rpl > CPL) 
changing_CPL = TRUE 
ELSE // (temp_CS.rpl < CPL) 
EXCEPTION [#GP(temp_CS)] // IRET to greater priv not allowed 


IF ((64BIT_MODE) || (changing_CPL) ) 
POP.v temp_RSP // in 64-bit mode or changing CPL, IRET always pops SS:RSP 
POP.v temp_SS 


CS = READ_DESCRIPTOR (temp_CS, iret_chk) 


IF ((64BIT_MODE) && (temp_RIP is non-canonical) || 
(!64BIT_MODE) && (temp_RIP > CS.limit)) 
EXCEPTION [#GP(0) ] 


IF (changing_CPL) 
IRET_PROTECTED_TO_OUTER_PRIV 
ELSE 
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IRET_PROTECTED_TO_SAME_PRIV 


IRET_PROTECTED_TO_OUTER_PRIV: 


CPL = CS.rpl 

// SS:RSP were popped, 
SS = READ_DESCRIPTOR (temp_SS, ss_chk) 
RSP.s = temp_RSP 


AMD64 Technology 


so load them into the registers 


// pop shadow stack and compare with program stack 


IF (ShadowStacksEnabled(old CPL) ) 
{ 
IF (SSP[2:0] != 0) 
EXCEPTION [#CP(RETF/IRET) ] 
IF (temp_newCPL != 3) 
{ 
temp_sstk_CS 
temp_sstk_LIP 
temp_SSP 
SSP = SSP +24 
IF (temp_CS != temp_sstk_CS) 
EXCEPTION [#CP(RETF/IRET) ] 
IF ((CS.base + RIP) != temp_sstk_LIP) 
EXCEPTION [#CP(RETF/IRET) ] 
IF (temp_SSP[1:0] != 0) 
EXCEPTION [#CP(RETF/IRET) ] 


SSTK_READ_MEM.q [SSP + 8] 
SSTK_READ_MEM.q [SSP] 


t 
r 


temp_oldSSP = SSP 
IF (ShadowStacksEnabled(new CPL) ) 
IF (new CPL 3) 
temp_SSP = PL3_SSP 
IF ((COMPATIBILITY_MODE) && (temp_SSP[63:32] 


// SSP must be 


SSTK_READ_MEM.q [SSP + 16] // 


8-byte aligned 


read CS from sstk 
read LIP 
read previous SSP 


// 
// 


// CS mismatch 
// LIP mismatch 


// prevSSP must be 4-byte aligned 


t= Oy) 


EXCEPTION [#GP(0)] // SSP must be <4GB in compat mode 


SSP = temp_SSP 


IF 


{ 

bool invalid_token = FALSE 

< start atomic section > 

temp_Token= SSTK_READ_MEM.q [temp_oldSSP] // 


IF ((temp_Token AND 0x01) != 1) 
invalid_Token = TRUE // 
IF ((temp_Token AND ~0x01) != temp_oldSSP) 


invalid_Token = TRUE // 
IF (!invalid_Token) 


temp_Token = temp_Token AND ~0x01 // 


(ShadowStacksEnabled(old CPL)) // check shadow stack token, clear busy 


read supervisor sstk token 
token busy bit must be 1 
address in token must=oldSSP 


clear token busy, if valid 
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SSTK_WRITE_MEM.q [temp_oldSSP] = temp_Token // writeback token 
< end atomic section > 
} // end shadow stacks enabled at old CPL 


FOR (seg = ES, DS, FS, GS) 


IF ((seg.sel == NULL) || ((Seg.attr.dpl < CPL) && 
((seg.attr.type == ‘data’) || 
(seg.attr.type == ’non-conforming-code’ )))) 


seg = NULL // can’t use lower DPL data segment at higher CPL 
// also clears RPL of any null selectors 


RFLAGS.v = temp_RFLAGS // VIF,VIP,IOPL only changed if old_CPL == 0 
// IF only changed if old_CPL <= old_RFLAGS.IOPL 
// \M unchanged 
// RF cleared 

RIP = temp_RIP 

EXIT // end IRET_PROTECTED_TO_OUTER_PRIV 


IRET_PROTECTED_TO_SAME_PRIV: 


IF (started in 64-bit mode) 
{ // in Long Mode SS:RSP were popped, so load them into the registers 
SS = READ_DESCRIPTOR (temp_SS, ss_chk) 
RSP.s = temp_RSP 


is 


IF (ShadowStacksEnabled(current CPL)) // pop the shadow stack 
{ // and compare with program stack 
IF (SSP[2:0] != 0) 

EXCEPTION [#CP(RETF/IRET ) ] // SSP must be 8-byte aligned 
temp_sstk_CS SSTK_READ_MEM.q [SSP + 16] // read CS from sstk 
temp_sstk_LIP SSTK_READ_MEM.q [SSP + 8] // read LIP 
temp_SSP SSTK_READ_MEM.q [SSP] // read previous SSP 
SSP = SSP +24 
IF (temp_CS != temp_sstk_CS) 


EXCEPTION [#CP(RETF/IRET) ] // CS mismatch 
IF ((CS.base + RIP) != temp_sstk_LIP) 
EXCEPTION [#CP(RETF/IRET) ] // LIP mismatch 
IF (temp_SSP[1:0] != 0) 
EXCEPTION [#CP(RETF/IRET ) ] // prevSSP must be 4-byte aligned 


IF ((COMPATIBILITY_MODE) && (tmp_sstk_prevSSP[63:32] != 0)) 
EXCEPTION [#GP(0)] // prevSSP must be <4GB in compat mode 
} // end shadow stack enabled at current CPL 


// check shadow stack token, clear busy 
IF ((ShadowStacksEnabled(currentCPL)) && (LONG_MODE) ) 
{ 
bool invalid_token = FALSE 
< start atomic section > 
temp_Token= SSTK_READ_MEM.q [temp_oldSSP] // read supervisor sstk token 
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IF ((temp_Token AND ©x01) != 1) 


invalid_Token = TRUE // token busy bit must be 1 
IF ((temp_Token AND ~x0@1) != temp_oldSSP) 

invalid_Token = TRUE // address in token must=oldSSP 
IF temp_SSP = SSP 

to_same_sstk = TRUE // switch was to same sstk 
IF ((!invalid_Token) AND (!to_same_sstk)) 

temp_Token = temp_Token AND ~0x01 // clear token busy, if valid 


SSTK_WRITE_MEM.q [temp_oldSSP] = temp_Token // writeback token 
< end atomic section > 
} // end shadow stacks enabled at CPL and in Long Mode 


RFLAGS.v = temp_RFLAGS // VIF,VIP,IOPL only changed if old_CPL == 0 
// IF only changed if old_CPL <= old_RFLAGS.IOPL 
// \M unchanged 
// RF cleared 

RIP = temp_RIP 

EXIT // end IRET_PROTECTED_TO_SAME_PRIV 


IRET_VIRTUAL : 


IF ((RFLAGS.IOPL < 3) && (CR4.VME == 0)) 
EXCEPTION [#GP(0)] 


POP.v temp_RIP 
POP.v temp_CS 
POP.v temp_RFLAGS 


IF (temp_RIP > CS.1limit) 
EXCEPTION [#GP(0)] 


IF (RFLAGS.IOPL == 3) 
{ 
RFLAGS.v = temp_RFLAGS // VIF,VIP,VM,IOPL unchanged, RF cleared 
cS.sel = temp_cS 
CS.base = temp_CS SHL 4 


RIP = temp_RIP 
EXIT 


} 


// (IOPL < 3) && (CR4.VME == 1) 

ELSEIF ((OPERAND_SIZE == 16) && 
((temp_RFLAGS.IF == 0) || (RFLAGS.VIP == ©)) && 
(temp_RFLAGS.TF == 0)) 


RFLAGS.w = temp_RFLAGS // RFLAGS.VIF = temp_RFLAGS.IF 
// IF unchanged, RF cleared 

cS.sel = temp_cS 

CS.base = temp_CS SHL 4 
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RIP = temp_RIP 
EXIT 


} 


ELSE 
// ((RFLAGS.IOPL < 3) && (CR4.VME == 1) && ((OPERAND_SIZE == 32) || 
// ((temp_RFLAGS.IF == 1) && (RFLAGS.VIP == 1)) || 
// (temp_RFLAGS.TF == 1))) 
EXCEPTION [#GP(0) ] 


IRET_FROM_PROTECTED_TO_VIRTUAL: 


// temp_RIP already popped 

// temp_CS already popped 

// temp_RFLAGS already popped, temp_RFLAGS.VM = 1 
// and CPL = 0 


POP.d temp_RSP 
POP.d temp_SS 
POP.d temp_ES 
POP.d temp_DS 
POP.d temp_FS 
POP.d temp_GS 


// force the segments to have virtual-mode values 
FOR (seg = CS, SS, ES, DS, FS, GS) 


{ 

seg.sel = temp_seg 
seg.base = temp_seg SHL 4 
seg.limit = OxOOO0FFFF 


IF (seg == CS) 

CS.attr = 16-bit dpl3 code 
ELSEIF (seg == SS) 

SS.attr = 16-bit dpl3 stack 
ELSE 

seg.attr = 16-bit dpl3 data 


} 
RSP.d = temp_RSP 
RFLAGS.d = temp_RFLAGS 
CPL = 3 


temp_oldSSP = SSP 


IF (ShadowStacksEnabled(old CPL) ) // old CPL is © at this point 
{ // check shadow stack token, clear busy 
bool invalid_token = FALSE 
< start atomic section > 
temp_Token= SSTK_READ_MEM.q [temp_oldSSP] // read supervisor sstk token 
IF ((temp_Token AND 0x01) != 1) 
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invalid_Token = TRUE 


IF ((temp_Token AND ~0x01) 
invalid_Token = TRUE 


IF (!invalid_Token) 
temp_Token = temp_Token AND ~0x01 // clear token busy, if valid 


SSTK_WRITE_MEM.q [temp_oldSSP] 


< end atomic section > 
} // end shadow stacks enabled at old CPL 


RIP = temp_RIP AND OxOOQOOFFFF 
EXIT // end IRET FROM PROTECTED TO VIRTUAL 


Related Instructions 


INT, INTO, INT3 


AMD64 Technology 


// token busy bit must be 1 


!= temp_oldSSP) 


// address in token must = oldSSP 


= temp_Token // writeback token 


rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
Mi|M|M/]}MJ]M/|M JM M Mi|M/M{}]M{/]M]M{/]M ]M/M 
21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 | 9 8 | 7 6 | 4 | 2/0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to one or cleared to zero is M (modified). Unaffected flags 
are blank. Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Segment not 
present, #NP Xx The return code segment was marked not present. 
(selector) 
A memory address exceeded the stack segment limit or was 
Stack, #SS ‘ a ‘ non-canonical. 
Stack, #SS Xx The SS register was loaded with a non-null segment selector 
(selector) and the segment was marked not present. 
Xx Xx Xx The target offset exceeded the code segment limit or was non- 
canonical. 
IOPL was less than 3 and one of the following conditions was 
true: 
¢ CR4.VME was 0. 
General protection, 
#GP x ¢ The effective operand size was 32-bit. 
¢ Both the original EFLAGS.VIP and the new EFLAGS.IF 
were set. 
* The new EFLAGS.TF was set. 
Xx IRETx was executed in long mode while EFLAGS.NT=1. 
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Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Xx The return code selector was a null selector. 
Xx The return stack selector was a null selector and the return 
mode was non-64-bit mode or CPL was 3. 
Xx The return code or stack descriptor exceeded the descriptor 
table limit. 
Xx The return code or stack selector’s TI bit was set but the LDT 
selector was a null selector. 
Xx The segment descriptor for the return code was not a code 
segment. 
Xx The RPL of the return code segment selector was less than 
General protection, the CPL. 
een The return code segment was non-conforming and the 
Xx segment selector’s DPL was not equal to the RPL of the code 
segment’s segment selector. 
The return code segment was conforming and the segment 
Xx selector’s DPL was greater than the RPL of the code 
segment’s segment selector. 
Xx The segment descriptor for the return stack was not a writable 
data segment. 
Xx The stack segment descriptor DPL was not equal to the RPL 
of the return code segment selector. 
Xx The stack segment selector RPL was not equal to the RPL of 
the return code segment selector. 
Page fault, #PF Xx Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
The return address on the program stack did not match the 
Control-protection, Xx address on the shadow stack, or the previous SSP is not 4 


#CP 


byte aligned, or the previous SSP was not <4GB when 
returning to 32-bit mode or compatibility mode. 
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LAR Load Access Rights Byte 


Loads the access rights from the segment descriptor specified by a 16-bit source register or memory 
operand into a specified 16-bit, 32-bit, or 64-bit general-purpose register and sets the zero (ZF) flag in 
the rFLAGS register if successful. LAR clears the zero flag if the descriptor is invalid for any reason. 


The LAR instruction checks that: 
¢ the segment selector is not a null selector. 
¢ the descriptor is within the GDT or LDT limit. 


e the descriptor DPL is greater than or equal to both the CPL and RPL, or the segment is a 
conforming code segment. 


e the descriptor type is valid for the LAR instruction. Valid descriptor types are shown in the 
following table. LDT and TSS descriptors in 64-bit mode, and call-gate descriptors in long mode, 
are only valid if bits 12:8 of doubleword +12 are zero. 


See Volume 2, Section 6.4 for more information on checking access rights using LAR. 


Valid Descriptor Type Description 


Legacy Mode Long Mode 


All All All code and data descriptors 


— Available 16-bit TSS 


2 LDT 


— Busy 16-bit TSS 


— 16-bit call gate 


Task gate 


9 Available 32-bit or 64-bit TSS 


Busy 32-bit or 64-bit TSS 


Q!}]w] of] ao] BI] wl] nt] ere 
| 


Cc 32-bit or 64-bit call gate 


If the segment descriptor passes these checks, the attributes are loaded into the destination general- 
purpose register. If it does not, then the zero flag is cleared and the destination register is not modified. 


When the operand size is 16 bits, access rights include the DPL and Type fields located in bytes 4 and 
5 of the descriptor table entry. Before loading the access rights into the destination operand, the low 
order word is masked with FFOOH. 


When the operand size is 32 or 64 bits, access rights include the DPL and type as well as the descriptor 
type (S field), segment present (P flag), available to system (AVL flag), default operation size (D/B 
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flag), and granularity flags located in bytes 4—7 of the descriptor. Before being loaded into the 
destination operand, the doubleword is masked with OOFF_FFOOH. 


In 64-bit mode, for both 32-bit and 64-bit operand sizes, 32-bit register results are zero-extended to 64 
bits. 


This instruction can only be executed in protected mode. 


Mnemonic Opcode Description 
Reads the GDT/LDT descriptor referenced by the 16-bit 
LAR reg16, reg/mem16 OF O2/r source operand, masks the attributes with FFOOh and saves 


the result in the 16-bit destination register. 


Reads the GDT/LDT descriptor referenced by the 16-bit 
LAR reg32, reg/mem16 OF O2/r source operand, masks the attributes with OOFFFFOOh and 
saves the result in the 32-bit destination register. 


Reads the GDT/LDT descriptor referenced by the 16-bit 
LAR reg64, reg/mem16 OF O2/r source operand, masks the attributes with OOFFFFOOh and 
saves the result in the 64-bit destination register. 


Related Instructions 


ARPL, LSL, VERR, VERW 


rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 


M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to one or zero is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Invalid opcode, #UD Xx Xx This instruction is only recognized in protected mode. 
Stack, #SS Xx A memory address exceeded the stack segment limit or was 


non-canonical. 


A memory address exceeded the data segment limit or was 


General protection, non-canonical. 


#GP 


A null data segment was used to reference memory. 


Page fault, #PF A page fault resulted from the execution of the instruction. 


An unaligned memory reference was performed while 
alignment checking was enabled. 


x | KI) xX 


Alignment check, #AC 
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LGDT Load Global Descriptor Table Register 


Loads the pseudo-descriptor specified by the source operand into the global descriptor table register 
(GDTR). The pseudo-descriptor is a memory location containing the GDTR base and limit. In legacy 
and compatibility mode, the pseudo-descriptor is 6 bytes; in 64-bit mode, it is 10 bytes. 


If the operand size is 16 bits, the high-order byte of the 6-byte pseudo-descriptor is not used. The lower 
two bytes specify the 16-bit limit and the third, fourth, and fifth bytes specify the 24-bit base address. 
The high-order byte of the GDTR is filled with zeros. 


If the operand size is 32 bits, the lower two bytes specify the 16-bit limit and the upper four bytes 
specify a 32-bit base address. 


In 64-bit mode, the lower two bytes specify the 16-bit limit and the upper eight bytes specify a 64-bit 
base address. In 64-bit mode, operand-size prefixes are ignored and the operand size is forced to 64- 
bits; therefore, the pseudo-descriptor is always 10 bytes. 


This instruction is only used in operating system software and must be executed at CPL 0. It is 
typically executed once in real mode to initialize the processor before switching to protected mode. 


LGDT is a serializing instruction. 


Mnemonic Opcode Description 
LGDT mem16:32. OF 01/2 Loads mem16:32 into the global descriptor table register. 
LGDT mem16:64. OF 01/2 Loads mem16:64 into the global descriptor table register. 


Related Instructions 


LIDT, LLDT, LTR, SGDT, SIDT, SLDT, STR 


rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
xX 
Invalid opcode, #UD x Xx The operand was a register. 
A memory address exceeded the stack segment limit or was 
Stack, #55 zs non-canonical. 
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Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Xx Xx Amemory address exceeded the data segment limit or was 
non-canonical. 
General protection, x x CPL was not O. 
#GP 
Xx The new GDT base address was non-canonical. 
x A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
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LIDT Load Interrupt Descriptor Table Register 


Loads the pseudo-descriptor specified by the source operand into the interrupt descriptor table register 
(IDTR). The pseudo-descriptor is a memory location containing the IDTR base and limit. In legacy 
and compatibility mode, the pseudo-descriptor is six bytes; in 64-bit mode, it is 10 bytes. 


If the operand size is 16 bits, the high-order byte of the 6-byte pseudo-descriptor is not used. The lower 
two bytes specify the 16-bit limit and the third, fourth, and fifth bytes specify the 24-bit base address. 
The high-order byte of the IDTR is filled with zeros. 


If the operand size is 32 bits, the lower two bytes specify the 16-bit limit and the upper four bytes 
specify a 32-bit base address. 


In 64-bit mode, the lower two bytes specify the 16-bit limit, and the upper eight bytes specify a 64-bit 
base address. In 64-bit mode, operand-size prefixes are ignored and the operand size is forced to 64- 
bits; therefore, the pseudo-descriptor is always 10 bytes. 


This instruction is only used in operating system software and must be executed at CPL 0. It is 
normally executed once in real mode to initialize the processor before switching to protected mode. 


LIDT is a serializing instruction. 


Mnemonic Opcode Description 
LIDT mem16:32 OF 01 /3 Loads mem16:32 into the interrupt descriptor table register. 
LIDT mem16:64 OF 01 /3 Loads mem16:64 into the interrupt descriptor table register. 


Related Instructions 


LGDT, LLDT, LTR, SGDT, SIDT, SLDT, STR 


rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Invalid opcode, #UD xX Xx Xx The operand was a register. 
Amemory address exceeded the stack segment limit or was 
Stack, #SS . ‘ non-canonical. 
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Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Xx Xx A memory address exceeded the data segment limit or was 


non-canonical. 


General protection, x 
#GP 


CPL was not 0. 


The new IDT base address was non-canonical. 


A null data segment was used to reference memory. 


x) x) | X< 


Page fault, #PF A page fault resulted from the execution of the instruction. 
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LLDT Load Local Descriptor Table Register 


Loads the specified segment selector into the visible portion of the local descriptor table (LDT). The 
processor uses the selector to locate the descriptor for the LDT in the global descriptor table. It then 
loads this descriptor into the hidden portion of the LDTR. 


If the source operand is a null selector, the LDTR is marked invalid and all references to descriptors in 
the LDT will generate a general protection exception (#GP), except for the LAR, VERR, VERW or 
LSL instructions. 


In legacy and compatibility modes, the LDT descriptor is 8 bytes long and contains a 32-bit base 
address. 


In 64-bit mode, the LDT descriptor is 16-bytes long and contains a 64-bit base address. The LDT 
descriptor type (02h) is redefined in 64-bit mode for use as the 16-byte LDT descriptor. 


This instruction must be executed in protected mode. It is only provided for use by operating system 
software at CPL 0. 


LLDT is a serializing instruction. 


Mnemonic Opcode Description 
LLDT OF 00 /2 Load the 16-bit segment selector into the local descriptor 
reg/mem16 table register and load the LDT descriptor from the GDT. 


Related Instructions 


LGDT, LIDT, LTR, SGDT, SIDT, SLDT, STR 


rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Invalid opcode, #UD Xx Xx This instruction is only recognized in protected mode. 
Segment not present, 
#NP (selector) Xx The LDT descriptor was marked not present. 
Amemory address exceeded the stack segment limit or was 
Stack, #38 - non-canonical. 
Xx A memory address exceeded a data segment limit or was 
non-canonical. 
General protection, 
#GP xX CPL was not 0. 
Xx A null data segment was used to reference memory. 
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Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Xx The source selector did not point into the GDT. 
Xx The descriptor was beyond the GDT limit. 
a protection, Xx The descriptor was not an LDT descriptor. 
(selector) Xx The descriptor's extended attribute bits were not Zero in 64- 
bit mode. 
Xx The new LDT base address was non-canonical. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
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LMSW Load Machine Status Word 


Loads the lower four bits of the 16-bit register or memory operand into bits 3:0 of the machine status 
word in register CRO. Only the protection enabled (PE), monitor coprocessor (MP), emulation (EM), 
and task switched (TS) bits of CRO are modified. Additionally, LMSW can set CRO.PE, but cannot 
clear it. 


The LMSW instruction can be used only when the current privilege level is 0. It is only provided for 
compatibility with early processors. 


Use the MOV CRO instruction to load all 32 or 64 bits of CRO. 


Mnemonic Opcode Description 
LMSW reg/mem16 OF 01/6 a the lower 4 bits of the source into the lower 4 bits of 


Related Instructions 


MOV CRn, SMSW 


rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #5 - ‘ non-canonical. 
Amemory address exceeded a data segment limit or was non- 
X X 
canonical. 
General protection, 
#GP xX Xx CPL was not 0. 
x A null data segment was used to reference memory. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
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LSL Load Segment Limit 


Loads the segment limit from the segment descriptor specified by a 16-bit source register or memory 
operand into a specified 16-bit, 32-bit, or 64-bit general-purpose register and sets the zero (ZF) flag in 
the rFLAGS register if successful. LSL clears the zero flag if the descriptor is invalid for any reason. 


In 64-bit mode, for both 32-bit and 64-bit operand sizes, 32-bit register results are zero-extended to 64 
bits. 


The LSL instruction checks that: 
e the segment selector is not a null selector. 


¢ the descriptor is within the GDT or LDT limit. 


e the descriptor DPL is greater than or equal to both the CPL and RPL, or the segment is a 
conforming code segment. 


e the descriptor type is valid for the LAR instruction. Valid descriptor types are shown in the 
following table. LDT and TSS descriptors in 64-bit mode are only valid if bits 12:8 of doubleword 
+12 are zero, as described in “System Descriptors” in Volume 2. 


Valid Descriptor Type Description 


Legacy Mode Long Mode 


— — All code and data descriptors 


fq — Available 16-bit TSS 

2 2 LDT 

3 — Busy 16-bit TSS 

9 9 Available 32-bit or 64-bit TSS 
B B Busy 32-bit or 64-bit TSS 


If the segment selector passes these checks and the segment limit is loaded into the destination 
general-purpose register, the instruction sets the zero flag of the rFLAGS register to 1. If the selector 
does not pass the checks, then LSL clears the zero flag to 0 and does not modify the destination. 


The instruction calculates the segment limit to 32 bits, taking the 20-bit limit and the granularity bit 
into account. When the operand size is 16 bits, it truncates the upper 16 bits of the 32-bit adjusted 
segment limit and loads the lower 16-bits into the target register. 


Mnemonic Opcode Description 
Loads a 16-bit general-purpose register with the segment 
LSL reg16, reg/mem16 OF O03/r limit for a selector specified in a 16-bit memory or register 
operand. 
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LSL reg32, reg/mem16 


LSL reg64, reg/mem16 


OF 03 /r 


OF 03 /r 


Related Instructions 


ARPL, LAR, VERR, VERW 


AMD64 Technology 


Loads a 32-bit general-purpose register with the segment 
limit for a selector specified in a 16-bit memory or register 
operand. 


Loads a 64-bit general-purpose register with the segment 
limit for a selector specified in a 16-bit memory or register 
operand. 


rFLAGS Affected 

ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M 

21/20 | 19 | 18 | 47) 16 | 44 | 13:32 | 41) 40/9 | 8 | 7 | 6] 4/12) 0 


Undefined flags are U. 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 


#AC 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
ee opcode, Xx This instruction is only recognized in protected mode. 
A memory address exceeded the stack segment limit or was 
Stack, #55 . non-canonical. 
Xx A memory address exceeded a data segment limit or was non- 

General protection, canonical. 

#GP 

Xx A null data segment was used to reference memory. 

Page fault, #PF x A page fault resulted from the execution of the instruction. 
Alignment check, Xx An unaligned memory reference was performed while 


alignment checking was enabled. 
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LTR Load Task Register 


Loads the specified segment selector into the visible portion of the task register (TR). The processor 
uses the selector to locate the descriptor for the TSS in the global descriptor table. It then loads this 
descriptor into the hidden portion of TR. The TSS descriptor in the GDT is marked busy, but no task 
switch is made. 


If the source operand is null, a general protection exception (#GP) is generated. 


In legacy and compatibility modes, the TSS descriptor is 8 bytes long and contains a 32-bit base 
address. 


In 64-bit mode, the instruction references a 64-bit descriptor to load a 64-bit base address. The TSS 
type (09H) is redefined in 64-bit mode for use as the 16-byte TSS descriptor. 


This instruction must be executed in protected mode when the current privilege level is 0. It is only 
provided for use by operating system software. 


The operand size attribute has no effect on this instruction. 


LTR is a serializing instruction. 


Mnemonic Opcode Description 


Load the 16-bit segment selector into the task register and 
LTRreg/imem16 OF 00/3 load the TSS descriptor from the GDT. 


Related Instructions 


LGDT, LIDT, LLDT, STR, SGDT, SIDT, SLDT 


rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 

Invalid opcode, #UD Xx Xx This instruction is only recognized in protected mode. 
Segment not present, : 

#NP (selector) Xx The TSS descriptor was marked not present. 

A memory address exceeded the stack segment limit or was 

Stack, #SS a non-canonical. 
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Exception 


Real 


Virtual 
8086 


Protected 


AMD64 Technology 


Cause of Exception 
A memory address exceeded a data segment limit or was 


X non-canonical. 
General protection, x CPL was not 0. 
ae Xx A null data segment was used to reference memory. 
Xx The new TSS selector was a null selector. 
Xx The source selector did not point into the GDT. 
Xx The descriptor was beyond the GDT limit. 
ee protection, Xx The descriptor was not an available TSS descriptor. 
(selector) Xx The descriptor's extended attribute bits were not zero in 64- 
bit mode. 
Xx The new TSS base address was non-canonical. 
Page fault, #PF Xx A page fault resulted from the execution of the instruction. 
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MONITOR Setup Monitor Address 


Establishes a linear address range of memory for hardware to monitor and puts the processor in the 
monitor event pending state. When in the monitor event pending state, the monitoring hardware 
detects stores to the specified linear address range and causes the processor to exit the monitor event 
pending state. The MWAIT instruction uses the state of the monitor hardware. 


The address range should be a write-back memory type. Executing MONITOR on an address range for 
a non-write-back memory type is not guaranteed to cause the processor to enter the monitor event 
pending state. The size of the linear address range that is established by the MONITOR instruction can 
be determined by CPUID function 0000_0005h. 


The [rAX] register provides the effective address. The DS segment is the default segment used to 
create the linear address. Segment overrides may be used with the MONITOR instruction. 


The ECX register specifies optional extensions for the MONITOR instruction. There are currently no 
extensions defined and setting any bits in ECX will result in a #GP exception. The ECX register 
operand is implicitly 32-bits. 


The EDX register specifies optional hints for the MONITOR instruction. There are currently no hints 
defined and EDX is ignored by the processor. The EDX register operand is implicitly 32-bits. 


The MONITOR instruction can be executed at CPL 0 and is allowed at CPL > 0 

only if MSR C001_0015h[MonMwaitUserEn] = 1. When MSR C001_0015h[MonMwaitUserEn] = 0, 
MONITOR generates #UD at CPL > 0. (See the BIOS and Kernel Developer’s Guide applicable to 
your product for specific details on MSR C001_0015h.) 


MONITOR performs the same segmentation and paging checks as a 1-byte read. 


Support for the MONITOR instruction is indicated by CPUID Fn0000_0001_ECX[MONITOR] = 1. 
Software must check the CPUID bit once per program or library initialization before using the 
MONITOR instruction, or inconsistent behavior may result. Software designed to run at CPL greater 
than 0 must also check for availability by testing whether executing MONITOR causes a #UD 
exception. 


The following pseudo-code shows typical usage of aMONITOR/MWAIT pair: 


EAX = Linear_Address_to_Monitor; 
ECX = 0; // Extensions 
EDX = 0; // Hints 


while (!matching_store_done) { 
MONITOR EAX, ECX, EDX 
IF (!matching_store_done) { 
MWAIT EAX, ECX 
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Mnemonic Opcode 


MONITOR OF 01 C8 


Related Instructions 


MWAIT, MONITORX, MWAITX 


AMD64 Technology 


Description 


Establishes a linear address range to be monitored 
by hardware and activates the monitor hardware. 


rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real | 8086 | Protected Cause of Exception 
The MONITOR/MWAIT instructions are not 
Xx x Xx supported, as indicated b 
Invalid opcode, #UD CPUID Fn0000_0001_ECX[MONITOR] = 0. 
Xx Xx CPL was not 0 and 
MSR CO001_0015[MonMwaitUserEn] = 0. 
Amemory address exceeded the stack segment limit 
Stack, #SS es 4 - or was non-canonical. 
Xx x Xx A memory address exceeded a data segment limit or 
was non-canonical. 
General protection, #GP X X X Pee was nonceend, 
Xx A null data segment was used to reference memory. 
A page fault resulted from the execution of the 
Page Fault, #PF Xx Xx ‘nStrchOn: 
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MOV CRn Move to/from Control Registers 
Moves the contents of a 32-bit or 64-bit general-purpose register to a control register or vice versa. 


In 64-bit mode, the operand size is fixed at 64 bits without the need for a REX prefix. In non-64-bit 
mode, the operand size is fixed at 32 bits and the upper 32 bits of the destination are forced to 0. 


CRO maintains the state of various control bits. CR2 and CR3 are used for page translation. CR4 holds 
various feature enable bits. CR8 is used to prioritize external interrupts. CR1, CR5, CR6, CR7, and 
CR9 through CR15 are all reserved and raise an undefined opcode exception (#UD) if referenced. 


CR8 can be read and written in 64-bit mode, using a REX prefix. CR8 can be read and written in all 
modes using a LOCK prefix instead of a REX prefix to specify the additional opcode bit. To verify 
whether the LOCK prefix can be used in this way, check for support of this feature. CPUID 
Fn8000_0001_ECX[AItMovCr8] = 1, indicates that this feature is supported. 


For more information on using the CPUID instruction, see the description of the CPUID instruction on 
page 165. 


CR8 can also be read and modified using the task priority register described in “System-Control 
Registers” in Volume 2. 


This instruction is always treated as a register-to-register (MOD = 11) instruction, regardless of the 
encoding of the MOD field in the MODR/M byte. 


MOV CRpn is a privileged instruction and must always be executed at CPL = 0. 


MOV CRpn is a serializing instruction. 


Mnemonic Opcode Description 
MOV CR», reg32 OF 22 /r Move the contents of a 32-bit register to CRn 
MOV CRn, reg64 OF 22 /r Move the contents of a 64-bit register to CRn 
MOV reg32, CRn OF 20 /r Move the contents of CRn to a 32-bit register. 
MOV reg64, CRn OF 20 /r Move the contents of CRn to a 64-bit register. 
MOV CR8, reg32 FO OF 22/r Move the contents of a 32-bit register to CR8. 
MOV CR8, reg64 FO OF 22/r Move the contents of a 64-bit register to CR8. 
MOV reg32, CR8 FO OF 20/r Move the contents of CR8 into a 32-bit register. 
MOV reg64, CR8 FO OF 20/r Move the contents of CR8 into a 64-bit register. 


Related Instructions 


CLTS, LMSW, SMSW 
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rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Xx Xx Xx An illegal control register was referenced (CR1, CR5—CR7, 
Invalid Instruction, CR9-CR15). 
#UD Xx Xx Xx The use of the LOCK prefix to read CR8 is not supported, as 
indicated by CPUID Fn8000_0001_ ECX[AltMovCr8] = 0. 
Xx xX CPL was not 0. 
x x An attempt was made to set CRO.PG = 1 and CRO.PE = 0. 
Xx Xx An attempt was made to set CRO.CD = 0 and CRO.NW = 1. 
Reserved bits were set in the page-directory pointers table 
xX Xx (used in the legacy extended physical addressing mode) and 
the instruction modified CRO, CR3, or CR4. 
x Xx An attempt was made to write 1 to any reserved bit in CRO, 
CR3, CR4 or CR8. 
General protection, An attempt was made to set CRO.PG while long mode was 
#GP x Xx enabled (EFER.LME = 1), but paging address extensions 
were disabled (CR4.PAE = 0). 
Xx An attempt was made to clear CR4.PAE while long mode was 
active (EFER.LMA = 1). 
Xx An attempt was made to set CR4.PCIDE=1 when long mode 
was disabled (EFER.LMA=0). 
Xx An attempt was made to set CR4.PCIDE=1 when CR3[11:0] 
<>0. 
Xx An attempt was made to set CRO.PG=0 when CR4.PCIDE=1. 
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Move to/from Debug Registers 


Moves the contents of a debug register into a 32-bit or 64-bit general-purpose register or vice versa. 


In 64-bit mode, the operand size is fixed at 64 bits without the need for a REX prefix. In non-64-bit 
mode, the operand size is fixed at 32-bits and the upper 32 bits of the destination are forced to 0. 


DRO through DR3 are linear breakpoint address registers. DR6 is the debug status register and DR7 is 
the debug control register. DR4 and DRS are aliased to DR6 and DR7 if CR4.DE = 0, and are reserved 


if CR4.DE = 1. 


DR8 through DR15 are reserved and generate an undefined opcode exception if referenced. 


These instructions are privileged and must be executed at CPL 0. 


The MOV DRn, reg32 and MOV DRn, reg6@4 instructions are serializing instructions. 


The MOV(DR) instruction is always treated as a register-to-register (MOD = 11) instruction, 
regardless of the encoding of the MOD field in the MODR/M byte. 


See “Debug and Performance Resources” in Volume 2 for details. 


Mnemonic 

MOV reg32, DRn 

MOV reg64, DRn 

MOV DRno, reg32 

MOV DRn, reg64 
Related Instructions 


None 


rFLAGS Affected 


None 


Opcode 


OF 21 /r 
OF 21 /r 
OF 23/r 
OF 23/r 


Description 


Move the contents of DRn to a 32-bit register. 
Move the contents of DRn to a 64-bit register. 
Move the contents of a 32-bit register to DRn. 


Move the contents of a 64-bit register to DRn. 
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Exceptions 
Virtual 
Exception Real} 8086 [Protected Cause of Exception 

A debug register was referenced while the general detect 

Debug, #08 x “ (GD) bit in DR7 was set. 
Xx Xx DR4 or DR5 was referenced while the debug extensions 
Xx An illegal debug register (DR8—DR15) was referenced. 
; X X CPL was not 0. 

General protection, ; 
#GP Xx A 1 was written to any of the upper 32 bits of DR6 or DR7 in 

64-bit mode. 
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MWAIT Monitor Wait 


Used in conjunction with the MONITOR instruction to cause a processor to wait until a store occurs to 
a specific linear address range from another processor. The previously executed MONITOR 
instruction causes the processor to enter the monitor event pending state. The MWAIT instruction may 
enter an implementation dependent power state until the monitor event pending state is exited. The 
MWAIT instruction has the same effect on architectural state as the NOP instruction. 


Events that cause an exit from the monitor event pending state include: 


¢ A store from another processor matches the address range established by the MONITOR 


instruction. 
e Any unmasked interrupt, including INTR, NMI, SMI, INIT. 
¢ RESET, 


e Any far control transfer that occurs between the MONITOR and the MWAIT. 


EAX specifies optional hints for the MWAIT instruction. Optimized C-state request is communicated 
through EAX[7:4]. The processor C-state is EAX[7:4]+1, so to request CO is to place the value F in 
EAX[7:4] and to request C1 is to place the value 0 in EAX[7:4]. All other components of EAX should 
be zero when making the C1 request. Setting a reserved bit in EAX is ignored by the processor. This is 
implicitly a 32-bit operand. 


ECX specifies optional extensions for the MWAIT instruction. The only extension currently defined is 
ECX bit 0, which allows interrupts to wake MWAIT, even when eFLAGS.IF = 0. Support for this 
extension is indicated by a feature flage returned by the CPUID instruction. Setting any unsupported 
bit in ECX results in a #GP exception. This is implicitly a 32-bit operand. 


CPUID Function 0000_0005h indicates support for extended features of MONITOR/MWAIT: 


¢ CPUID Fn0000_0005_ECX[EMX] = 1 indicates support for enumeration of MONITOR/MWAIT 
extensions. 


¢ CPUID Fn0000_0005_ECX[IBE] = 1 indicates that MWAIT can set ECX[0] to allow interrupts to 
cause an exit from the monitor event pending state even when eFLAGS.IF = 0. 


The MWAIT instruction can be executed at CPL 0 and is allowed at CPL > 0 only if MSR 
C001_0015h[MonMwaitUserEn] =1. When MSR C001_0015h[MonMwaitUserEn] is 0, MWAIT 
generates #UD at CPL > 0. (See the BIOS and Kernel Developer’s Guide applicable to your product 
for specific details on MSR C001_0015h.) 


Support for the MWAIT instruction is indicated by CPUID Fn0000_0001_ECX[MONITOR] = 1. 
Software MUST check the CPUID bit once per program or library initialization before using the 
MWAIT instruction, or inconsistent behavior may result. Software designed to run at CPL greater than 
0 must also check for availability by testing whether executing MWAIT causes a #UD exception. 


The use of the MWAIT instruction is contingent upon the satisfaction of the following coding 
requirements: 
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¢ MONITOR must precede the MWAIT and occur in the same loop. 


¢ MWAIT must be conditionally executed only if the awaited store has not already occurred. (This 
prevents a race condition between the MONITOR instruction arming the monitoring hardware and 
the store intended to trigger the monitoring hardware.) 


The following pseudo-code shows typical usage of aMONITOR/MWAIT pair: 


EAX = Linear_Address_to_Monitor; 
ECX = 0; // Extensions 
EDX = 0; // Hints 


WHILE (!matching_store_done ){ 
MONITOR EAX, ECX, EDX 
IF ( !matching_store_done ) { 
MWAIT EAX, ECX 


iy 


Mnemonic Opcode Description 
Causes the processor to stop instruction execution 


MWAIT OF 01 C9 and enter an implementation-dependent optimized 
state until occurrence of a class of events. 


Related Instructions 


MONITOR 
rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real | 8086 /| Protected Cause of Exception 
The MONITOR/MWAIT instructions are not supported, 
x Xx x as indicated by 
Invalid opcode, #UD CPUID Fn0000_0001_ECX[MONITOR] = 0. 
Xx Xx CPL was not 0 and 
MSRCO001_0015[MonMwaitUserEn] = 0. 
oo protection, Xx Xx Xx Unsupported extension bits were set in ECX 
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PSMASH Page Smash 


Expands a 2MB-page RMP entry into a corresponding set of contiguous 4KB-page RMP entries. The 
2MB page’s system physical address is specified in the RAX register. 


The new entries inherit the attributes of the original entry. Upon completion, a return code is stored in 
EAX. rFLAGS bits OF, ZF, AF, PF and SF are set based on this return code. 


The PSMASH instruction invalidates all TLB entries in the system that translate to the 2MB page 
being expanded. 


This instruction is intended for hypervisor use. Attempted execution at an ASID other than 0 will 
result ina FAIL PERMISSION return code. 


This is a privileged instruction. Attempted execution at a privilege level other than CPLO will result in 
a #GP(0) exception. In addition, this instruction is only valid in 64-bit mode with SNP enabled; in all 
other modes a #UD exception will be generated. 


Support for this instruction is indicated by CPUID Fn8000_001F_EAX[SNP]=1. 


Mnemonic Opcode Description 
PSMASH F30FO1FF Creates 512 4KB RMP entries from a 2MB RMP entry 
Action 


SYSTEM_PA = RAX & ~OX1FFFFF 


IF (!64BIT_MODE) // Instruction only valid in 64-bit mode 
EXCEPTION [#UD] 


IF (!SYSCFG.SNP_EN) // Instruction only valid when SNP is enabled 
EXCEPTION [#UD] 


IF (CPL != 0) // Instruction only allowed at CPL 0 
EXCEPTION [#GP(0)] 


IF (CURRENT_ASID != @) // Instruction only allowed at ASID 0 
EAX = FAIL_PERMISSION 
EXIT 


RMP_ENTRY_PA = RMP_BASE + 0x4000 + (SYSTEM_PA / 0x1000) * 16 


IF (RMP_ENTRY_PA > RMP_END) // System address must have an RMP entry 
EAX = FAIL_INPUT 
EXIT 


temp_RMP = READ_MEM_PA.o [RMP_ENTRY_PA] 


IF (temp_RMP.IMMUTABLE || !temp_RMP.ASSIGNED || (temp_RMP.PAGE_SIZE != 2MB)) 
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EAX = FAIL_BADADDR 
EXIT 


temp_RMP.PAGE_ SIZE = 4KB 
WRITE_MEM_PA.o [RMP_ENTRY_PA] = temp_RMP 


FOR (I = 1; I < 512, I++) 


{ 
temp_RMP.GUEST_PA = temp_RMP.GUEST_PA + 0x1000; 
WRITE_MEM_PA.o [RMP_ENTRY_PA + I * 16] = temp_RMP; 
t 
EAX = SUCCESS 
EXIT 


Return Codes 


Value |Name Description 
0 SUCCESS Successful completion 
1 FAIL_INPUT Illegal input parameters 
2 FAIL_PERMISSION Current ASID not 0 
3 FAIL_INUSE Another processor is modifying the same RMP entry 
4 FAIL_BADADDR The page did not meet smashing criteria 


Related Instructions 


RMPUPDATE, PVALIDATE, RMPADJUST 
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rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M M M M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Xx Xx x The SNP instructions are not supported as indicated by 
CPUID Fn8000_001F_EAX[SNP] = 0 
Invalid opcode, 
#UD Xx x This instruction is only recognized in 64-bit mode 


X 
x SYSCFG[SNP_EN] was not set to 1 
X 


General Protection, 


#GP CPL was not 0 
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PVALIDATE Page Validate 


Validates or rescinds validation of a guest page’s RMP entry. The guest virtual address is specified in 
the register operand rAX. The portion of RAX used to form the address is determined by the effective 
address size (current execution mode and optional address size prefix). The page size is specified in 
ECX[0]. The new RMP Validated bit is specified in EDX[0]. 


The PVALIDATE instruction is used by an SNP-active guest to modify the validation status of a page. 
The PVALIDATE instruction will attempt to access the provided page and will take a 
#V MEXIT(NPF) if a nested translation error occurs or the translated address is outside the range of 
memory covered by the RMP. Assuming no error is detected, the PVALIDATE instruction will store 
EDX[0] to the Validated bit in the page’s RMP entry. 


Upon completion, a return code is stored in EAX. rFLAGS bits OF, ZF, AF, PF and SF are set based on 
this return code. If the instruction completed successfully, the rFLAGS bit CF indicates if the contents 
of the RMP entry were changed or not. 


While this instruction is intended for use in SNP-active guest system software, it is recognized in any 
operating mode at CPLO. If the PVALIDATE instruction is executed by an SNP-active guest and 
changes the Validated bit in the RMP entry, upon completion it sets rFLAGS.CF to 0. If the 
PVALIDATE instruction is executed in a non-SNP-active environment or does not change the 
Validated bit in the RMP entry, it sets rFLAGS.CF to 1 and otherwise behaves as a NOP instruction. 


This is a privileged instruction. Attempted execution at a privilege level other than CPLO will result in 
a #GP(0) exception. 


PVALIDATE performs the same segmentation and paging checks as a 1-byte read. PVALIDATE does 
not invalidate TLB caches. 


Support for this instruction is indicated by CPUID Fn8000_001F_EAX[SNP]=1. 


Mnemonic Opcode Description 
PVALIDATE F20F01FF Performs guest page validation 
Action 


GUEST_VA = rAX & ~OxFFF 
PAGE_SIZE = ECX[0] 
VALIDATE_PAGE = EDX[0] 


IF (CPL != 0) // This instruction is only allowed at CPL 0 
EXCEPTION [#GP(Q) ] 


IF (!SNP_ACTIVE) 


rFLAGS.CF = 1 // Set CF to indicate that the RMP was not changed 
EAX = SUCCESS 
EXIT 
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IF (CURRENT_VMPL != 0) 


EXCEPTION [#GP(0) ] // This instruction is only allowed at VMPL 0 
IF ((PAGE_SIZE == 2MB) && (GUEST_VA[20:12] != @)) 

EAX = FAIL_INPUT // Page size is 2MB and page is not 2MB aligned 

EXIT 


(SYSTEM_PA, GUEST_PA) = TRANSLATE(GUEST_VA) 
RMP_ENTRY_PA = RMP_BASE + 0x4000 + (SYSTEM_PA / 0x1000) * 16 


IF (RMP_ENTRY_PA > RMP_END) 
#VMEXIT(NPF) //Translated system address must have an RMP entry 


temp_RMP = READ_MEM_PA.o [RMP_ENTRY_PA] 


IF (temp_RMP.IMMUTABLE || !temp_RMP.ASSIGNED | | 
(temp_RMP.GUEST_PA != GUEST_PA) || (temp_RMP.ASID != ASID) || 
(temp_RMP.PAGE_SIZE != nPT page size) || 
((temp_RMP.PAGE_SIZE == 2MB) && (PAGE_SIZE == 4KB))) 

#VMEXIT (NPF) 


IF ((RMP_DATA.PAGE_SIZE == 4KB) && (PAGE_SIZE == 2MB)) 
EAX = FAIL_SIZEMISMATCH // 2MB validation backed by 4kKB pages 
EXIT 


IF (temp_RMP.VALIDATED == VALIDATE_PAGE) 
rFLAGS.CF = 1 

ELSE 
rFLAGS.CF = 0 


temp_RMP.VALIDATED = VALIDATE_PAGE 
WRITE_MEM_PA.o [RMP_ENTRY_PA] = temp_RMP 
EAX = SUCCESS 

EXIT 


Return Codes 


Value |Name Description 
Successful completion (regardless of whether Validated bit 
0 SUCCESS changed state) 
FAIL_INPUT Illegal input parameters 
FAIL_SIZEMISMATCH Page size mismatch between guest (2M) and RMP entry (4k) 


Related Instructions 


RMPUPDATE, PSMASH, RMPADJUST 
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rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M M M M M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 

Invalid opcode, Xx Xx x The SNP instructions are not supported as indicated by 
#UD CPUID Fn8000_001F_EAX[SNP] = 0 

General Protection, Xx Xx CPL was not 0 

#GP xX X xX Current VMPL was not zero 

Xx Xx A page fault resulted from the execution of the instruction 
Page Fault, #PF : - - 
X The effective C-bit was a 0 during the guest page table walk 
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RDMSR Read Model-Specific Register 


Loads the contents of a 64-bit model-specific register (MSR) specified in the ECX register into 
registers EDX:EAX. The EDX register receives the high-order 32 bits and the EAX register receives 
the low order bits. The RDMSR instruction ignores operand size; ECX always holds the MSR number, 
and EDX:EAX holds the data. If a model-specific register has fewer than 64 bits, the unimplemented 
bit positions loaded into the destination registers are undefined. 


This instruction must be executed at a privilege level of 0 or a general protection exception (#GP) will 
be raised. This exception is also generated if a reserved or unimplemented model-specific register is 
specified in ECX. 


Support for the RDMSR instruction is indicated by CPUID Fn0000_0001_EDX[MSR] = 1 OR 
CPUID Fn8000_0001_EDX[MSR] = 1. For more information on using the CPUID instruction, see the 
description of the CPUID instruction on page 165. 


For more information about model-specific registers, see the documentation for various hardware 


implementations and “Model-Specific Registers (MSRs)” in Volume 2: System Programming. 


Mnemonic Opcode Description 


RDMSR OF 32 Copy MSR specified by ECX into EDX:EAX. 


Related Instructions 


WRMSR, RDTSC, RDPMC 


rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
The RDMSR instruction is not supported, as indicated by 
eee as x | x xX | CPUID Fn0000_0001_EDX[MSR] = 0 or CPUID 
Fn8000_0001_EDX[MSR] = 0. 
; X Xx CPL was not 0. 
General protection, — 
#GP x Xx The value in ECX specifies a reserved or unimplemented 
MSR address. 


428 AM D Publi c Us @] System Instruction Reference 


AMD¢1 
24594—Rev. 3.33—November 2021 AMD64 Technology 


RDPKRU Read Protection Key Rights 


Loads the contents of the 32-bit Protection Key Rights (PKRU) register into RAX[31:0] and clears the 
upper 32 bits of RAX. RDX is also cleared to 0. The RDPKRU instruction ignores operand size. 


This instruction must be executed with ECX=0, otherwise a general protection fault (GP) is 
generated. The upper 32 bits of RCX are ignored. Memory protection keys must be enabled 
(CR4.PKE=1), otherwise executing this instruction generates an invalid opcode fault (HUD). 


Software can check that the operating system has enabled memory protection keys (CR4.PKE=1) by 
testing CPUID Function 0000_0007h_ECX[OSPKE]. (See Section 5, “Protection Key Rights for 
User Pages” in AMD64 Architecture Programmer’s Manual Volume 2 for more information on 
memory protection keys.) 


RDPKRU can be executed at any privilege level. 


Mnemonic Opcode Description 
RDPKRU OF 01 EE Read the PKRU MSR into EAX and clear RDX 
Related Instructions 


WRPKRU 


rFLAGS Affected 


None 


Exceptions 


Exception Real| 8086 |Protected Cause of Exception 


Invalid opcode, 
#UD X X X 


General protection, Xx 
#GP 


CR4.PKE=0 


ECX was not zero 
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RDPMC Read Performance-Monitoring Counter 


Reads the contents of a 64-bit performance counter and returns it in the registers EDX:EAX. The ECX 
register is used to specify the index of the performance counter to be read. The EDX register receives 
the high-order 32 bits and the EAX register receives the low order 32 bits of the counter. The RDPMC 
instruction ignores operand size; the index and the return values are all 32 bits. 


The base architecture supports four core performance counters: PerfCtr0O—3. An extension to the 
architecture increases the number of core performance counters to 6 (PerfCtr0—5). Other extensions 
add four northbridge performance counters NB_PerfCtr0—3 and four L2 cache performance counters 
L2I_PerfCtr0-3. 


To select the core performance counter to be read, specify the counter index, rather than the 
performance counter MSR address. To access the northbridge performance counters, specify the index 
of the counter plus 6. To access the L2 cache performance counters, specify the index of the counter 
plus 10. 


Programs running at any privilege level can read performance monitor counters if the PCE flag in CR4 
is set to 1; otherwise this instruction must be executed at a privilege level of 0. 


This instruction is not serializing. Therefore, there is no guarantee that all instructions have completed 
at the time the performance counter is read. 


For more information about performance-counter registers, see the documentation for various 
hardware implementations and “Performance Counters” in Volume 2. 


Support for performance counters beyond PerfCtr0—3 is indicated as follows: 


¢ CPUID Fn8000_0001_ECX[PerfCtrExtCore] = 1 indicates support for core performance counters 
PerfCtr4-5 


¢ CPUID Fn8000_0001_ECX[PerfCtrExtNB] = 1 indicates support for the 4 architecturally defined 
Northbridge performance counters . 


e CPUID Fn8000_0022_EBX[NumDfPmc] > 4 indicates support for additional Northbridge 
performance counters. 


¢ CPUID Fn8000_0001_ECX[PerfCtrExtLLC] = 1 indicates support for the six L3 performance 
counters. 


For more information on using the CPUID instruction, see the description of the CPUID instruction on 
page 165. 


Instruction Encoding 


Mnemonic Opcode Description 


Copy the performance monitor counter specified 
BREME OF 33 by ECX into EDX:EAX. 
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Related Instructions 


RDMSR, WRMSR 


rFLAGS Affected 


None 


Exceptions 


Exception Protected Cause of Exception 


Xx Xx Xx The value in ECX specified an unimplemented performance 
General Protection, counter number. 


#GP 


X Xx CPL was not 0 and CR4.PCE = 0. 
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RDSSP Read Shawdow Stack Pointer 


Reads the current Shadow Stack Pointer (SSP) to the specified GPR. The operand size is 64 bits in 64- 
bit mode when REX.W=1 and is 32 bits in all other cases. RDSSP is treated as a NOP if CR4.CET = 0, 
or if shadow stacks are not enabled at the current privilege level. 


Mnemonic Opcode Description 
RDSSPD reg32 F3 OF 1E /1 Read SSP[31:0] to reg32 
RDSSPQ reg64 F3 OF 1E /1 Read SSP[63:0] to reg64 
Action 
IF (((CPL==3) && SSTK_USER_ENABLED) || ((CPL!=3) && SSTK_SUPV_ENABLED) ) 
IF (OPERAND_SIZE == 64) 
reg64 = SSP 
ELSE 
reg32 = SSP[31:0] 


EXIT 


Related Instructions 
RDSSP, RSTORSSP 


rFLAGS Affected 


None 


Exceptions 


None. 
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RDTSC Read Time-Stamp Counter 


Loads the value of the processor’s 64-bit time-stamp counter into registers EDX:EAX. 


The time-stamp counter (TSC) is contained in a 64-bit model-specific register (MSR). The processor 
sets the counter to 0 upon reset and increments the counter every clock cycle. INIT does not modify the 
TSC. 


The high-order 32 bits are loaded into EDX, and the low-order 32 bits are loaded into the EAX 
register. This instruction ignores operand size. 


When the time-stamp disable flag (TSD) in CR4 is set to 1, the RDTSC instruction can only be used at 
privilege level 0. If the TSD flag is 0, this instruction can be used at any privilege level. 


This instruction is not serializing. Therefore, there is no guarantee that all instructions have completed 
at the time the time-stamp counter is read. 


The behavior of the RDTSC instruction is implementation dependent. The TSC counts at a constant 
rate, but may be affected by power management events (such as frequency changes), depending on the 
processor implementation. If CPUID Fn8000_0007_EDX[TscInvariant] = 1, then the TSC rate is 
ensured to be invariant across all P-States, C-States, and stop-grant transitions (such as STPCLK 
Throttling); therefore, the TSC is suitable for use as a source of time. Consult the BIOS and Kernel 
Developer’s Guide applicable to your product for information concerning the effect of power 
management on the TSC. 


Support for the RDTSC instruction is indicated by CPUID Fn0000_0001_EDX[TSC] = 1 OR CPUID 
Fn8000_0001_EDX[TSC] = 1. For more information on using the CPUID instruction, see the 
description of the CPUID instruction on page 165. 


Instruction Encoding 


Mnemonic Opcode Description 


RDTSC OF 31 Copy the time-stamp counter into EDX:EAX. 
Related Instructions 
RDTSCP, RDMSR, WRMSR 


rFLAGS Affected 


None 
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Virtual 
Exception Real} 8086 |Protected Cause of Exception 
The RDTSC instruction is not supported, as indicated by 
Invalid opcode, #UD X X X CPUID Fn0000_0001_EDX[TSC] =0OR 
CPUID Fn8000_0001_EDX[TSC] = 0. 
Pade tall prelecuon, x X | CPLwas not 0 and CR4.TSD = 1. 


#GP 
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RDTSCP Read Time-Stamp Counter 
and Processor ID 


Loads the value of the processor’s 64-bit time-stamp counter into registers EDX:EAX, and loads the 
value of TSC_AUX into ECX. This instruction ignores operand size. 


The time-stamp counter is contained in a 64-bit model-specific register (MSR). The processor sets the 
counter to 0 upon reset and increments the counter every clock cycle. INIT does not modify the TSC. 


The high-order 32 bits are loaded into EDX, and the low-order 32 bits are loaded into the EAX 
register. 


The TSC_AUX value is contained in the low-order 32 bits of the TSC_AUX register (MSR address 
C000_0103h). This MSR is initialized by privileged software to any meaningful value, such as a 
processor ID, that software wants to associate with the returned TSC value. 


When the time-stamp disable flag (TSD) in CR4 is set to 1, the RDTSCP instruction can only be used 
at privilege level 0. If the TSD flag is 0, this instruction can be used at any privilege level. 


Unlike the RDTSC instruction, RDTSCP forces all older instructions to retire before reading the time- 
stamp counter. 


The behavior of the RDTSCP instruction is implementation dependent. The TSC counts at a constant 
rate, but may be affected by power management events (such as frequency changes), depending on the 
processor implementation. If CPUID Fn8000_0007_EDX[TscInvariant] = 1, then the TSC rate is 
ensured to be invariant across all P-States, C-States, and stop-grant transitions (such as STPCLK 
Throttling); therefore, the TSC is suitable for use as a source of time. Consult the BIOS and Kernel 
Developer’s Guide applicable to your product for information concerning the effect of power 
management on the TSC. 


Support for the RDTSCP instruction is indicated by CPUID Fn8000_0001_EDX[RDTSCP] = 1. For 
more information on using the CPUID instruction, see the description of the CPUID instruction on 
page 165. 


Instruction Encoding 


Mnemonic Opcode Description 


Copy the time-stamp counter into EDX:EAX and 
BETSCF: QEOLEY the TSC_AUX register into ECX. 


Related Instructions 
RDTSC 


rFLAGS Affected 


None 
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Virtual 
Exception Real} 8086 |Protected Cause of Exception 
: The RDTSCP instruction is not supported, as indicated by 
Invalid opcode; #UD™ | Xx. |) x X | CPUID Fn8000_0001_ EDX[RDTSCP] = 0. 
Cenc ral pielecaon: x X | CPLwas not 0 and CR4.TSD = 1. 


#GP 
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RMPADJUST Adjust RMP Permissions 


Modifies RMP permissions for a guest page. The guest virtual address is specified in the RAX register. 
The page size is specified in RCX[0]. The target VMPL and its permissions are specified in the RDX 
register as follows: 


RDxX bits | Field Description 
[63:17] RESERVED 
[16] VMSA Indicates if the page may be used as a VM Save Area page. 


This bit is ignored whenever the current VMPL is not 0 
[15:8] TARGET_PERM_MASK Desired permission mask settings 
[7:0] TARGET_VMPL Target VMPL 


The RMPADJUST instruction is used by an SNP-active guest to modify RMP permissions of a lesser- 
privileged VMPL. The RMPADJUST instruction will attempt to access the specified page and will 
take a #VMEXIT(NPF) if a nested translation error occurs or the translated address is outside the 
range of memory covered by the RMP. Assuming no such error is detected, the target VMPL is 
numerically higher than the current VMPL, and the specified permissions for the target VMPL are not 
greater than the permissions of the current VMPL, the RMPADJUST instruction will modify the target 
permission mask in the RMP entry. 


Upon completion, a return code is stored in EAX. rFLAGS bits OF, ZF, AF, PF and SF are set based on 
this return code. 


RMPADJUST performs the same segmentation and paging checks as a 1-byte read. RMPADJUST 
does not invalidate TLB caches. 


This is a privileged instruction. Attempted execution at a privilege level other than CPLO will result in 
a #GP(0) exception. In addition, this instruction is only valid in 64-bit mode in an SNP-active guest; in 
all other modes a #UD exception will be generated. 


Support for this instruction is indicated by CPUID Fn8000_001F_EAX[SNP]=1. 


Mnemonic Opcode Description 
RMPADJUST F30F01FE Modifies RMP permissions 
Action 


GUEST_VA = RAX & ~OxFFF 
PAGE_SIZE = RCX[0] 
TARGET_VMPL = RDX[7:0] 
TARGET_PERM_MASK = RDX[15:8] 
VMSA = RDX[16] 


IF (!64BIT_MODE) // Instruction only valid in 64-bit mode 
EXCEPTION [#UD] 


IF (!SNP_ACTIVE) 
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EXCEPTION [#UD] 


IF (CPL != 0) // Instruction only allowed at CPL 0 
EXCEPTION [#GP(Q) ] 


IF ((PAGE_SIZE == 2MB) && (GUEST_VA[20:12] != 0)) 
EAX = FAIL_INPUT // Page size is 2MB and not 2MB aligned 
EXIT 


IF (TARGET_VMPL <= CURRENT_VMPL) // Only permissions for numerically- 
EAX = FAIL_PERMISSION // higher VMPL can be modified 
EXIT 


(SYSTEM_PA, GUEST_PA) = TRANSLATE(GUEST_VA) 
RMP_ENTRY_PA = RMP_BASE + 0x4000 + (SYSTEM_PA / 0x1000) * 16 


IF (RMP_ENTRY_PA > RMP_END) // Translated system address 
#VMEXIT(NPF) // must have an RMP entry 


temp_RMP = READ_MEM_PA.o [RMP_ENTRY_PA] 


IF (temp_RMP.IMMUTABLE || !temp_RMP.ASSIGNED | | 
(temp_RMP.GUEST_PA != GUEST_PA) || (temp_RMP.ASID != ASID) || 
(temp_RMP.PAGE_SIZE != nPT page size) || 
((temp_RMP.PAGE_SIZE == 2MB) && (PAGE_SIZE == 4KB))) 

#VMEXIT (NPF) 


IF (!temp_RMP.VALIDATED) 
#VC(PAGE_NOT_VALIDATED) 


IF ((RMP_DATA.PAGE_SIZE == 4KB) && (PAGE_SIZE == 2MB)) 
EAX = FAIL_SIZEMISMATCH 
EXIT 


IF (TARGET_PERM_MASK & ~temp_RMP.PERMISSIONS[CURRENT_VMPL] ) 
EAX = FAIL_PERMISSION 
EXIT 


IF (CURRENT_VMPL == 0) 
temp_RMP.VMSA = VMSA 


temp_RMP.PERMISSIONS[TARGET_VMPL] = TARGET_PERM_MASK 
WRITE_MEM_PA.o [RMP_ENTRY_PA] = temp_RMP 


EAX = SUCCESS 
EXIT 
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Return Codes 


Value |Name Description 
0 SUCCESS Successful completion 
1 FAIL_INPUT Illegal input parameters 
2 FAIL_PERMISSION Insufficient permissions 
6 FAIL_SIZEMISMATCH Page size mismatch between guest and RMP 


Related Instructions 


PVALIDATE, RMPUPDATE, PSMASH 


rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M M M M 
21 | 20 | 19 | 18 ; 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 
Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to O is M (modified). Unaffected flags are blank. 
Undefined flags are U. 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Xx Xx Xx The SNP instructions are not supported as indicated by 
CPUID Fn8000_001F_EAX[SNP] = 0 
Invalid opcode, 
#UD Xx x Xx This instruction is only recognized in 64-bit mode 
Xx Guest is not SNP-Active 
General Protection, 
#GP Xx CPL was not 0 
x A page fault resulted from the execution of the instruction 
Page Fault, #PF : : ; 
Xx The effective C-bit was a 0 during the guest page table walk 
VMM 
Communication, Xx RMP.VALIDATED was not set to 1 
#VC 
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RMPUPDATE Write RMP Entry 


Writes anew RMP entry. The system physical address of a page whose RMP entry is modified is 
specified in the RAX register. The RCX register provides the effective address of a 16-byte data 
structure which contains the new RMP state. The DS segment is the default segment used to create the 
linear address, but may be overridden by a segment prefix. The layout of the data structure with the 
new RMP state is as follows: 


Byte Length 

Offset | (bytes) |Name Description 
O0Oh 8 GUEST_PA Guest physical address 
08h a ASSIGNED Assigned flag (bit 0) 
09h 1 PAGE_SIZE Page size (0 = 4KB, 1 = 2MB) (bit 0) 
OAh 1 IMMUTABLE Immutable flag (bit 0) 

OBh 1, - Reserved (SBZ) 

OCh 4 ASID ASID of intended page owner 


The RMPUPDATE instruction checks that new RMP state is legal before it updates the RMP table. 


Upon completion, a return code is stored in EAX. rFLAGS bits OF, ZF, AF, PF and SF are set based on 
this return code. 


The RMPUPDATE instruction invalidates all TLB entries in the system that translate to the page being 
modified. 


This instruction is intended for hypervisor use. Attempted execution at an ASID other than 0 will 
result ina FAIL _ PERMISSION return code. 


This is a privileged instruction. Attempted execution at a privilege level other than CPLO will result in 
a #GP(0) exception. In addition, this instruction is only valid in 64-bit mode with SNP enabled; in all 
other modes a #UD exception will be generated. 


Support for this instruction is indicated by the feature flag CPUID Fn8000_001F_EAX[SNP]=1. 


Mnemonic Opcode Description 
RMPUPDATE F2O0FO1FE Writes anew RMP entry 
Action 


SYSTEM_PA = RAX & ~OXFFF 
NEW_RMP_PTR = RCX 


IF (!64BIT_MODE) // Instruction only valid in 64-bit mode 
EXCEPTION [#UD] 


IF (!SYSCFG.SNP_EN) // Instruction only valid when SNP enabled 
EXCEPTION [#UD] 
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IF (CPL != 0) 
EXCEPTION [#GP(Q)] 


IF (CURRENT_ASID != 0) 
EAX = FAIL_PERMISSION 
EXIT 


NEW_RMP 


IF ((NEW_RMP.PAGE_SIZE == 2MB) && (SYSTEM_PA[20:12] 


EAX = FAIL_INPUT 
EXIT 


IF (!NEW_RMP.ASSIGNED && 
EAX = FAIL_INPUT 
EXIT 


AMD64 Technology 


// Instruction only allowed at CPL 0 


// Instruction only allowed at ASID 0 


READ_MEM.o [NEW_RMP_PTR] 


!= 0)) 


(NEW_RMP.IMMUTABLE || (NEW_RMP.ASID != @)) 


RMP_ENTRY_PA = RMP_BASE + 0x4000 + (SYSTEM_PA / 0x1000) * 16 


IF (RMP_ENTRY_PA > RMP_END) 


EAX = FAIL_INPUT 
EXIT 
OLD_RMP = READ_MEM_PA.o 


IF (OLD_RMP. IMMUTABLE ) 
EAX FAITL_PERMISSION 
EXIT 


IF (NEW_RMP.PAGE_SIZE == 
IF ((SYSTEM_PA[20:12] 
EAX = FAIL_OVERLAP 
EXIT 
ELSE 
IF (Any 4KB RMP entry 


WRITE_MEM.o [RMP_ENTRY_PA + I * 16] 


EAX = FAIL_OVERLAP 
EXIT 
ELSE 
FOR (I = 1; I < 512, 
{ 
temp_RMP = 0 
temp_RMP.ASSIGNED 
is 
IF (!NEW_RMP.ASSIGNED) 
temp_RMP = 0 


ELSE 
temp_RMP.ASID 
temp_RMP.GUEST_PA = N 
temp_RMP.PAGE_SIZE 


// System address must have an RMP entry 


[RMP_ENTRY_PA] 


AKB) 
@) && (OLD_RMP.PAGE_SIZE 


2MB) ) 


with (RMP.ASSIGNED 1) exists in 2MB region) 


I++) 


NEW_RMP .ASSIGNED 


temp_RMP; 


NEW_RMP .ASTD 


EW_RMP.GUEST_PA 
NEW_RMP .PAGE_SIZE 
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temp_RMP.ASSIGNED = NEW_RMP.ASSIGNED 
temp_RMP.IMMUTABLE = NEW_RMP. IMMUTABLE 


temp_RMP.VALIDATED = OLD_RMP.VALIDATED 
temp_RMP.PERMISSIONS = OLD_RMP.PERMISSIONS 
temp_RMP.VMSA = OLD_RMP.VMSA 


IF (NEW_RMP.ASID == 0) 
temp_RMP.GUEST_PA = 0 


IF ((OLD_RMP.ASID “ NEW_RMP.ASID) | | 
(OLD_RMP.GUEST_PA “ NEW_RMP.GUEST_PA) | | 
(OLD_RMP.PAGE_SIZE “ NEW_RMP.PAGE_SIZE) | | 
(OLD_RMP.ASSIGNED “ NEW_RMP.ASSIGNED) ) 

N = CPUID Fn8000001F_EBX[15:12] 
temp_RMP.VALIDATED = 0 
temp_RMP.VMSA = 0 
temp_RMP.PERMISSIONS[0] = OxF 
temp_RMP.PERMISSIONS[1:(N-1)] = 0 


WRITE_MEM_PA.o [RMP_ENTRY_PA] = temp_RMP 


EAX = SUCCESS 
EXIT 


Return Codes 


Value |Name Description 
0 SUCCESS Successful completion 
1 FAIL_INPUT Illegal input parameters 
2 FAIL_PERMISSION Current ASID not 0 or RMP entry is Immutable 
3 FAIL_INUSE Another processor is modifying the same RMP entry 
4 FAIL_OVERLAP 4KB page and 2MB page RMP overlap detected 


Related Instructions 


PVALIDATE, PSMASH, RMPADJUST 
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rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M M M M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Xx Xx x The SNP instructions are not supported as indicated by 
CPUID Fn8000_001F_EAX[SNP] = 0 
Invalid opcode, 
#UD Xx x This instruction is only recognized in 64-bit mode 


SYSCFG[SNP_EN] was not set to 1 
CPL was not 0 
A null data segment was used to reference memory 


General Protection, 
#GP 


Xx 
Xx 
Xx 
xX 
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RSM Resume from System Management Mode 


Resumes an operating system or application procedure previously interrupted by a system 
management interrupt (SMI). The processor state is restored from the information saved when the SMI 
was taken. The processor goes into a shutdown state if it detects invalid state information in the system 
management mode (SMM) save area during RSM. 


RSM will shut down if any of the following conditions are found in the save map (SSM): 


¢ An illegal combination of flags in CRO (CRO.PG = 1 and CRO.PE = 0, or CRO.NW = 1 and 
CRO.CD = 0). 


e Areserved bit in CR3, CR4, or the extended feature enable register (EFER) is set to 1. 
e Areserved bit in the range 63:32 of CRO, DR6, or DR7 is set to 1. 
¢ The following bit combination occurs: EFER.LME = 1, CRO.PG = 1, CR4.PAE = 0. 


¢ The following bit combination occurs: EFER.LME = 1, CRO.PG = 1, CR4.PAE = 1, CS.D = 1, 
CS.L=1. 


¢ SMM revision field has been modified. 
¢ The following bit combination occurs: CR4.PCIDE=1 and EFER.LMA=0. 


RSM cannot modify EFER.SVME. Attempts to do so are ignored. 


When EFER.SVME is 1, RSM reloads the four PDPEs (through the incoming CR3) when returning to 
a mode that has legacy PAE mode paging enabled. 


When EFER.SVME is 1, the RSM instruction is permitted to return to paged real mode (i.e., 
CRO.PE=0 and CRO.PG=1). 


The AMD64 architecture uses a new 64-bit SMM state-save memory image. This 64-bit save-state 
map is used in all modes, regardless of mode. See “System-Management Mode” in Volume 2 for 
details. 


Mnemonic Opcode Description 
RSM OF AA Resume operation of an interrupted program. 


Related Instructions 


None 
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rFLAGS Affected 


All flags are restored from the state-save map (SSM). 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M M M M M M M M M M M M M M M M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 


Virtual 
8086 |Protected Cause of Exception 


fae opcode, 


The processor was not in System Management Mode (SMM). 
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RSTORSSP Restore Saved Shadow Stack Pointer 


Restores SSP using the shadow stack restore token pointed to by the memory operand. If the token 
validation checks pass, SSP is set to the linear address of the memory operand and the restore token is 
replaced with a previous SSP token. 


If a return to the previous shadow stack is required, the SAVEPREVSSP instruction can be used to 
save the previous SSP token to the previous stack. Otherwise, the INCSSP instruction can be used to 
pop the unneeded previous SSP token from the shadow stack. 


If the restored SSP is 4-byte aligned and not 8-byte aligned, CF is set to 1 indicating an alignment hole. 
The INCSSP instruction can be used to increment SSP past the alignment hole. 


Mnemonic Opcode Description 
RSTORSSP mem64 F3 OF 01/5 Restore SSP and create previous SSP token. 
Action 


// see "Pseudocode Definition" on page 57 


IF ((CPL == 3) && (!SSTK_USER_ENABLED) ) 
EXCEPTION [#UD] 


IF ((CPL < 3) && (!SSTK_SUPV_ENABLED) ) 
EXCEPTION [#UD] 


temp_linAdr = Linear_Address(mem64) 

IF (temp_linAdr is not 8-byte aligned) 
EXCEPTION [#GP(0)] 

bool INVALID_TOKEN = FALSE 

< start atomic section > 


temp_rstorToken = SSTK_READ_MEM.q [mem64] // fetch token, with locked read 


IF ((temp_rstorToken AND 0x02) != 0) 
INVALID_TOKEN = TRUE // token bit 1 must be clear 


IF (64BIT_MODE != (temp_rstorToken AND 0x01) ) 
INVALID_TOKEN = TRUE // token bit © must match current mode 


IF (!64-bit mode) && (temp_rstorToken[63:32] != 0)) 
INVALID_TOKEN = TRUE // previous SSP must be <4Gb in 
// legacy and compat modes 


(temp_rstorToken AND ~@x01) - 8 
temp_prevSSP AND ~0x07 


temp_prevSSP 
temp_prevSSP 
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IF (temp_prevSSP != temp_linAdr) 
INVALID_TOKEN = TRUE // prev SSP from token must match lin addr 


temp_prevSSPtoken = SSP OR 64BIT_MODE OR 0x@2 //create the previousSSP token 

SSTK_WRITE_MEM.q [mem64] = INVALID_TOKEN ? temp_rstorToken : temp_prevSSPtoken 
// write token and unlock 

< end atomic section > 


IF (INVALID_TOKEN) 
EXCEPTION [#CP(RSTORSSP) ] 
ELSE 
{ 
SSP = temp_linAdr // SSP = linear address of memory operand 
RFLAGS.ZF,PF,AF,OF,SF = 0 
RFLAGS.CF = (temp_rstorToken AND 0x04) ? 1: 0; // set CF if SSP in token 
// was 4-byte aligned 


} 


EXIT 


Related Instructions 


SAVEPREVSSP 
rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 


0 0 0 0 0 M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 

Virtual 
Exception Real| 8086 |Protected Cause of Exception 

X CR4.CET = 0 

Shadow stacks not enabled at current privilege level. 


Invalid opcode, #UD 


The linear address was not 8-byte aligned. 


A memory address exceeded a data segment limit. 


In long mode, the address of the memory operand was non- 


General protection, canonical. 


#GP 


A null data segment was used to reference memory. 


A non-writeable data segment was used. 


«x | KI) KK) KK | CK) x) xX 


An execute-only code segment was used to reference 
memory. 
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Exception 


Real 


Virtual 
8086 


Protected 


Cause of Exception 
The mode bit (bit 0) in the token did not match the current 


- mode. 
oe Protection, Xx The type bit (bit 1) in the token was not 0. 
Xx The SSP address in the token did not match the linear 
address of the memory operand. 
Xx The linear address was not a shadow stack page. 
Page fault, #PF - - - 
Xx A page fault resulted from the execution of the instruction. 
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SAVEPREVSSP Save Previous Shadow Stack Pointer 


Saves a restore shadow stack token to previous shadow stack. The previous SSP pointer is taken from 
the previous SSP token found at the top of the current shadow stack. The previous SSP token is then 
popped from the current shadow stack. 


Mnemonic Opcode Description 
SAVEPREVSSP F30F 01 EA Aeon restore shadow stack token to the previous shadow 
Action 


// see "Pseudocode Definition" on page 57 


IF ((CPL == 3) && (!SSTK_USER_ENABLED) ) 
EXCEPTION [#UD] 


IF ((CPL < 3) && (!SSTK_SUPV_ENABLED) ) 
EXCEPTION [#UD] 


IF (SSP is not 8-byte aligned) 
EXCEPTION [#GP(0)] 


temp_prevSSPtoken = SSTK_READ_MEM.q [SSP] // pop prev SSP token 
// from current stack 


temp_SSP = SSP 
temp_SSP = temp_SSP + 8 
IF (RFLAGS.CF) // CF indicates a 4-byte alignment hole exists 


IF (64BIT_MODE) 
EXCEPTION [#GP(0)] // alignment hole allowed only in legacy/compat mode 
ELSE 


hole = SSTK_READ_MEM.d [temp_SSP] // pop the 4-byte alignment hole 
temp_SSP = temp_SSP + 4 
IF (hole != 0) 

EXCEPTION [#GP(0)] // the alignment hole must be all 0’s 


IF ((temp_prevSSPtoken AND 0x02) != 1) 
EXCEPTION [#GP(0) ] // prev SSP token must have bit 1 set 


IF (64BIT_MODE != (temp_prevSSPtoken AND 0x01) ) 
EXCEPTION [#GP(0)] // token bit © must match current mode 


IF (!64-bit mode) && (temp_prevSSPtoken[63:32] != 0)) 
EXCEPTION [#GP(0)] // previous SSP must be <4Gb in 
// legacy and compat modes 
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temp_oldSSP = temp_prevSSPtoken AND ~0x03 
temp_rstorSSPtoken = temp_oldSSP OR (64BIT_MODE) //create the restore 
SSP token 

SSTK_WRITE_MEM.d [temp_oldSSP - 4] = 0x0 // zero out hole (in case aligning 

// oOldSSP creates a hole) 
temp_oldSSP = temp_oldSSP AND ~0x07 // align oldSSP to next 8b boundary 
SSTK_WRITE_MEM.q [temp_oldSSP-8]= temp_rstorSSPtoken // write restore token to 

// old stack 

SSP = temp_SSP // no faults, update SSP 


Related Instructions 


RSTORSSP 
rFLAGS Affected 
None. 
Exceptions 
Exception Real ees Beniecied Cause of Exception 
Xx Xx Instruction is only recognized in protected mode. 


Invalid opcode, #UD CR4.CET = 0 


Shadow stacks not enabled at current privilege level. 


The SSP was not 8-byte aligned. 


The type bit (bit 1) in the token was not 1. 


General protection, #GP CF was set in 64-bit mode. 


The previous SSP was >4Gb when not in 64-bit mode. 


A non-zero alignment hole was found in legacy or 
compatibility mode. 


A page fault resulted from the execution of the instruction. 


Page fault, #PF A shadow stack reference was made to a non-shadow 


stack page. 


«x | KI) KK | KY] | KY) KY) KY] xX 
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SETSSBSY Set Shadow Stack Busy 


Validates a non-busy (not in-use) shadow stack token pointed to by the PLO_SSP MSR and sets the 
token’s busy bit. If the validation checks pass, SSP is set to the address in PLO_SSP. 


SETSSBY is a privileged instruction and must be executed with CPL=0, otherwise a #GP exception is 
generated. If shadow stacks are not enabled at the supervisor level, a #UD exception is generated. 


Mnemonic Opcode Description 
SETSSBSY F30F01E8 ~~ Validate token and set shadow stack busy bit 
Action 


// see "Pseudocode Definition" on page 57 


IF (CR4.CET == 0) 
EXCEPTION [#UD] 

IF (S_CET.SH_STK_EN == 0) 
EXCEPTION [#UD] 

IF (CPL != 0) 
EXCEPTION [#GP(0)] 


temp_newSSP = PLO_SSP 


IF (temp_newSSP is not 8-byte aligned) 
EXCEPTION [#GP(0)] 


bool FAULT = FALSE 
< start atomic section > 
temp_Token = SSTK_READ_MEM.q [temp_newSSP] // fetch token with locked read 
IF ((!64-bit mode) && (temp_token[63:32] != 0)) 
FAULT=TRUE // address in token must be < 4GB 


// in legacy/compatibility mode 
IF ((temp_Token AND 0x01) != 0) 


FAULT = TRUE // token busy bit must be 0 
IF ((temp_Token AND ~x0@1) != temp_newSSP) 

FAULT = TRUE // address in token must match new SSP 
IF (!FAULT) 


temp_Token = temp_Token OR 0x01 // if no faults, set token busy bit 
SSTK_WRITE_MEM.q [temp_newSSP] = temp_Token // write token and unlock 
< end atomic section > 


IF (FAULT) 
EXCEPTION [#CP(SETSSBSY) ] 
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ELSE 


SSP = temp_newSSP 


EXIT 


Related Instructions 


24594—Rev. 3.33—November 2021 


// if no faults, SSP = PLO_SSP 


CLRSSBSY 
rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Xx x x Instruction is only recognized in protected mode. 
Invalid Opcode, = 
#UD X CR4.CET = 0. 
Xx Shadow stacks not enabled at supervisor level. 
General Protection, x CPL!=0 
#GP x PLO_SSP MSR is not 8-byte aligned. 
Xx The shadow stack token is busy. 
Xx The shadow stack token reserved bits are not 0. 
Control, #CP ; ; 
x PLO_SSP MSR >4Gb when not in 64-bit mode. 
x The new SSP in the token != PLO_SSP. 
Xx PLO_SSP MSR is not a supervisor shadow stack page. 
Page Fault, #PF ; ; 
Xx A page fault resulted from the execution of the instruction. 
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SGDT Store Global Descriptor Table Register 


Stores the global descriptor table register (GDTR) into the destination operand. In legacy and 
compatibility mode, the destination operand is 6 bytes; in 64-bit mode, it is 10 bytes. In all modes, 
operand-size prefixes are ignored. 


In non-64-bit mode, the lower two bytes of the operand specify the 16-bit limit and the upper 4 bytes 
specify the 32-bit base address. 


In 64-bit mode, the lower two bytes of the operand specify the 16-bit limit and the upper 8 bytes 
specify the 64-bit base address. 


This instruction is intended for use in operating system software, but it can be used at any privilege 
level. 


Mnemonic Opcode Description 
SGDT mem16:32 OF 01 /O Store global descriptor table register to memory. 
SGDT mem16:64 OF 01 /0 Store global descriptor table register to memory. 


Related Instructions 


SIDT, SLDT, STR, LGDT, LIDT, LLDT, LTR 


rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Invalid opcode, Xx Xx Xx The operand was a register. 
#UD 
A memory address exceeded the stack segment limit or was 
Stack, #35 . “ “ non-canonical. 
Amemory address exceeded a data segment limit or was non- 
X X X 
; canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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SIDT Store Interrupt Descriptor Table Register 


Stores the interrupt descriptor table register (IDTR) in the destination operand. In legacy and 
compatibility mode, the destination operand is 6 bytes; in 64-bit mode it is 10 bytes. In all modes, 
operand-size prefixes are ignored. 


In non-64-bit mode, the lower two bytes of the operand specify the 16-bit limit and the upper 4 bytes 
specify the 32-bit base address. 


In 64-bit mode, the lower two bytes of the operand specify the 16-bit limit and the upper 8 bytes 
specify the 64-bit base address. 


This instruction is intended for use in operating system software, but it can be used at any privilege 
level. 


Mnemonic Opcode Description 
SIDT mem16:32 OF 01 /1 Store interrupt descriptor table register to memory. 
SIDT mem16:64 OF 01 /1 Store interrupt descriptor table register to memory. 


Related Instructions 


SGDT, SLDT, STR, LGDT, LIDT, LLDT, LTR 


rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Invalid opcode, Xx Xx Xx The operand was a register. 
#UD 
A memory address exceeded the stack segment limit or was 
Stack, #35 . “ ‘ non-canonical. 
A memory address exceeded a data segment limit or was non- 
X X X 
; canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
x A null data segment was used to reference memory. 
Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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SKINIT Secure Init and Jump with Attestation 


Securely reinitializes the cpu, allowing for the startup of trusted software (such as a VMM). The code 
to be executed after reinitialization can be verified based on a secure hash comparison. SKINIT takes 
the physical base address of the SLB as its only input operand, in EAX. The SLB must be structured as 
described in “Secure Loader Block” on page 499 of the AMD64 Architecture Programmer’s Manual 
Volume 2: System Programming, order# 24593, and is assumed to contain the code for a Secure Loader 
(SL). 


This is a Secure Virtual Machine (SVM) instruction. Support for the SVM architecture and the SVM 
instructions is indicated by CPUID Fn8000_0001_ECX[SVM] = 1. For more information on using the 
CPUID instruction, see the reference page for the CPUID instruction on page 165. 


This instruction generates a #UD exception if SVM is not enabled. See “Enabling SVM” in AMD64 
Architecture Programmer’s Manual Volume 2: System Instructions, order# 24593. 


Mnemonic Opcode Description 
SKINIT EAX OF 01 DE Secure initialization and jump, with attestation. 
Action 
IF ((EFER.SVME == ©) && !(CPUID 8000_0001.ECX[SKINIT]) |] (!PROTECTED_MODE) ) 
EXCEPTION [#UD] // This instruction can only be executed 


// in protected mode with SVM enabled. 


IF (CPL != 0) // This instruction is only allowed at CPL 0. 
EXCEPTION [#GP] 


Initialize processor state as for an INIT signal 
CRO.PE = 1 


cS.sel = 0x0008 

cS.attr 32-bit code, read/execute 
CS.base (c) 

CS.limit = OXFFFFFFFF 


SS.sel = 0x0010 

SS.attr 32-bit stack, read/write, expand up 
SS.base (C) 

SS.limit = OxFFFFFFFF 


EAX & OxFFFFOO0O // Form SLB base address. 
family/model/stepping 

EAX + ©x00010000 // Initial SL stack. 
GPRs other than EAX, EDX, ESP 


m 

is} 

x< 
ou oul 


i) 
be 
i?) 
ie) 
5 


EFER = 0 
VM_CR.DPD = 1 
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VM_CR.R_INIT = 1 


VM_CR.DIS_A20M 1 


Enable SL_DEV, to protect 64Kbyte of physical memory starting at 
the physical address in EAX 
GIF = 0 


Read the SL length from offset 0x0002 in the SLB 
Copy the SL image to the TPM for attestation 


Read the SL entrypoint offset from offset Ox0000 in the SLB 
Jump to the SL entrypoint, at EIP = EAX+entrypoint offset 


Related Instructions 


None. 

rFLAGS Affected 

ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
) 0 0 0 0 ) 0 0 ) 0 0 0 0 0 0 0 0 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to 0 is M (modified). Unaffected flags are blank. 
Undefined flags are U. 


Exceptions 
Virtual 
Exception Real} 8086 |Protected Cause of Exception 
Secure Virtual Machine was not enabled (EFER.SVME=0) 
and both of the following conditions were true: 
¢ SVM-Lock is not available, as indicated by 
Invalid opcode, #UD Xx CPUID Fn8000_000A_EDX[SVML] = 0. 
¢ DEV is not available, as indicated by CPUID 
Fn8000_0001_ECX[SKINIT] = 0. 
Xx x Instruction is only recognized in protected mode 
General protection, 
#GP xX CPL was not 0. 
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SLDT Store Local Descriptor Table Register 
Stores the local descriptor table (LDT) selector to a register or memory destination operand. 


If the destination is a register, the selector is zero-extended into a 16-, 32-, or 64-bit general purpose 
register, depending on operand size. 


If the destination operand is a memory location, the segment selector is written to memory as a 16-bit 
value, regardless of operand size. 


This SLDT instruction can only be used in protected mode, but it can be executed at any privilege 
level. 


Mnemonic Opcode Description 


Store the segment selector from the local 


SLD TgRgt6 OF 00 /0 descriptor table register to a 16-bit register. 
Store the segment selector from the local 
SLDT reg32 OF 0070 descriptor table register to a 32-bit register. 
Store the segment selector from the local 
SLDT rego4 OF 0070 descriptor table register to a 64-bit register. 
Store the segment selector from the local 
SLDT mem16 OF 00 /O descriptor table register to a 16-bit memory 
location. 
Related Instructions 
SIDT, SGDT, STR, LIDT, LGDT, LLDT, LTR 
rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
ee opcode, Xx Xx This instruction is only recognized in protected mode. 
A memory address exceeded the stack segment limit or was 
Stack, #58 ‘ non-canonical. 
Xx Amemory address exceeded a data segment limit or was non- 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
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Exception Real| 8086 |Protected Cause of Exception 
Page fault, #PF x A page fault resulted from the execution of the instruction. 
Alignment check, Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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SMSW Store Machine Status Word 


Stores the lower bits of the machine status word (CRO). The target can be a 16-, 32-, or 64-bit register 
or a 16-bit memory operand. 


This instruction is provided for compatibility with early processors. 


This instruction can be used at any privilege level (CPL). 


Mnemonic Opcode Description 
SMSW reg16 OF 01 /4 Store the low 16 bits of CRO to a 16-bit register. 
SMSW reg32 OF 01 /4 Store the low 32 bits of CRO to a 32-bit register. 
SMSW reg64 OF 01 /4 Store the entire 64-bit CRO to a 64-bit register. 
SMSW mem16 OF 01 /4 Store the low 16 bits of CRO to memory. 


Related Instructions 


LMSW, MOV CRn 


rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
A memory address exceeded the stack segment limit or was 
Stack, #38 m ‘ ‘ non-canonical. 
Amemory address exceeded a data segment limit or was non- 
X X X 
canonical. 
General protection, 
#GP Xx The destination operand was in a non-writable segment. 
Xx A null data segment was used to reference memory. 
Page fault, #PF x Xx A page fault resulted from the execution of the instruction. 
Alignment check, Xx Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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STAC 


Sets the Alignment Check flag in the rFLAGS register to one. Support for the STAC instruction is 
indicated by CPUID Fn07_EBX[20] =1. For more information on using the CPUID instruction, see 
the description of the CPUID instruction on page 165. 


Mnemonic 


STAC 


Related Instructions 


Opcode 


OF 01 CB 


24594—Rev. 3.33—November 2021 


Set Alignment Check Flag 


Description 


Sets the AC flag 


CLAC 

rFLAGS Affected 

ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
1 

21 | 20 | 19 | 18 | 17 | 16] 14) #1312 )/ 11 ]10/9 | 8 | 7}]64]4 42 4 0 


blank. Undefined flags are U. 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to 1 or cleared to O is M (modified). Unaffected flags are 


Exceptions 


Exception 


Invalid opcode, #UD 


Virtual 
Real} 8086 [Protected Cause of Exception 
Xx Xx Xx Instruction not supported by CPUID 
Xx Instruction is not supported in virtual mode 
Xx Lock prefix (FOh) preceding opcode. 
Xx CPL was not 0 
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STI Set Interrupt Flag 


Sets the interrupt flag (IF) in the rFLAGS register to 1, thereby allowing external interrupts received 
on the INTR input. Interrupts received on the non-maskable interrupt (NMI) input are not affected by 
this instruction. 


In real mode, this instruction sets IF to 1. 


In protected mode and virtual-8086-mode, this instruction is IOPL-sensitive. If the CPL is less than or 
equal to the rFLAGS.IOPL field, the instruction sets IF to 1. 


In protected mode, if IOPL < 3, CPL = 3, and protected mode virtual interrupts are enabled 

(CR4.PVI = 1), then the instruction instead sets rFLAGS.VIF to 1. If none of these conditions apply, 
the processor raises a general protection exception (#GP). For more information, see “Protected Mode 
Virtual Interrupts” in Volume 2. 


In virtual-8086 mode, if IOPL < 3 and the virtual-8086-mode extensions are enabled (CR4.VME = 1), 
the STI instruction instead sets the virtual interrupt flag (rFLAGS. VIF) to 1. 


If STI sets the IF flag and IF was initially clear, then interrupts are not enabled until after the 
instruction following STI. Thus, if IF is 0, this code will not allow an INTR to happen: 


STI 
CLI 


In the following sequence, INTR will be allowed to happen only after the NOP. 


STI 
NOP 
CLI 


If STI sets the VIF flag and VIP is already set, a #GP fault will be generated. 


See “Virtual-8086 Mode Extensions” in Volume 2 for more information about IOPL-sensitive 
instructions. 


Mnemonic Opcode Description 


STI FB Set interrupt flag (IF) to 1. 
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Action 


IF (CPL <= IOPL) 
RFLAGS.IF = 1 


ELSIF (((VIRTUAL_MODE) && (CR4.VME == 1)) 
|| ((PROTECTED_MODE) && (CR4.PVI == 1) && (CPL == 3))) 


IF (RFLAGS.VIP == 1) 
EXCEPTION[#GP(0) ] 
RFLAGS.VIF = 1 
EXCEPTION[#GP(0) ] 
Related Instructions 


CLI 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. M (modified) is either set to one or cleared to zero. Unaffected flags are 
blank. Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Xx The CPL was greater than the IOPL and virtual-mode 
extensions were not enabled (CR4.VME = 0). 
The CPL was greater than the IOPL and either the CPL was 
ae Pieteenen, Xx not 3 or protected-mode virtual interrupts were not enabled 
(CR4.PVI = 0). 
Xx Xx This instruction would set RFLAGS.VIF to 1 and 
RFLAGS.VIP was already 1. 
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STGI Set Global Interrupt Flag 
Sets the global interrupt flag (GIF) to 1. While GIF is zero, all external interrupts are disabled. 
This is a Secure Virtual Machine (SVM) instruction. 


Attempted execution of this instruction causes a #UD exception if SVM is not enabled and neither 
SVM Lock nor the device exclusion vector (DEV) are supported. Support for SVM Lock is indicated 
by CPUID Fn8000_000A_EDX[SVML] = 1. Support for DEV is part of the SKINIT architecture and 
is indicated by CPUID Fn8000_0001_ECX[SKINIT] = 1. For more information on using the CPUID 
instruction, see the description of the CPUID instruction on page 165. 


For information on enabling SVM, see “Enabling SVM” in AMD64 Architecture Programmer’s 
Manual Volume-2: System Instructions, order# 24593. 


Mnemonic Opcode Description 
STGI OF 01 DC Sets the global interrupt flag (GIF). 


Related Instructions 


CLGI 
rFLAGS Affected 
None. 
Exceptions 
Virtual 
Exception Real} 8086 |Protected Cause of Exception 


Secure Virtual Machine was not enabled (EFER.SVME=0) 
and both of the following conditions were true: 


¢ SVM Lock is not available, as indicated by 

Invalid opcode, #UD CPUID Fn8000_000A_EDX[SVML] = 0. 

¢ DEV is not available, as indicated by 
CPUID Fn8000_0001_ECX[SKINIT] = 0. 


Xx Xx Instruction is only recognized in protected mode. 


General protection, Xx 


#GP CPL was not 0. 
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STR Store Task Register 


Stores the task register (TR) selector to a register or memory destination operand. 


If the destination is a register, the selector is zero-extended into a 16-, 32-, or 64-bit general purpose 
register, depending on the operand size. 


If the destination is a memory location, the segment selector is written to memory as a 16-bit value, 
regardless of operand size. 


The STR instruction can only be used in protected mode, but it can be used at any privilege level. 


Mnemonic Opcode Description 


Store the segment selector from the task register to a 16-bit 


Stig?) OF OO general-purpose register. 


Store the segment selector from the task register to a 32-bit 


ote iegee OF 00/1 general-purpose register. 

Store the segment selector from the task register to a 64-bit 
STRIegGs ORO general-purpose register. 
STR memi6 OF 00/1 Store the segment selector from the task register to a 16-bit 


memory location. 


Related Instructions 


LGDT, LIDT, LLDT, LTR, SIDT, SGDT, SLDT 


rFLAGS Affected 
None 
Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Invalid opcode, #UD Xx Xx This instruction is only recognized in protected mode. 
Stack, #SS Xx A memory address exceeded the stack segment limit or was 


non-canonical. 


A memory address exceeded a data segment limit or was 
non-canonical. 


General protection, 


#GP The destination operand was in a non-writable segment. 


A null data segment was used to reference memory. 


Page fault, #PF 


Alignment check, 
#AC 


A page fault resulted from the execution of the instruction. 


x | KI) KK) KY] OX 


An unaligned memory reference was performed while 
alignment checking was enabled. 
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SWAPGS Swap GS Register with KernelGSbase MSR 


Provides a fast method for system software to load a pointer to system data structures. SWAPGS can 
be used upon entering system-software routines as a result of aSYSCALL instruction, an interrupt or 
an exception. Prior to returning to application software, SWAPGS can be used to restore the 
application data pointer that was replaced by the system data-structure pointer. 


This instruction can only be executed in 64-bit mode. Executing SWAPGS in any other mode 
generates an undefined opcode exception. 


The SWAPGS instruction only exchanges the base-address value located in the KernelGSbase model- 
specific register (MSR address C000_0102h) with the base-address value located in the hidden- 
portion of the GS selector register (GS.base). This allows the system-kernel software to access kernel 
data structures by using the GS segment-override prefix during memory references. 


The address stored in the KernelGSbase MSR must be in canonical form. The WRMSR instruction 
used to load the KernelGSbase MSR causes a general-protection exception if the address loaded is not 
in canonical form. The SWAPGS instruction itself does not perform a canonical check. 


This instruction is only valid in 64-bit mode at CPL 0. A general protection exception (#GP) is 
generated if this instruction is executed at any other privilege level. 


For additional information about this instruction, refer to “System Instructions” in Volume 2. 


Examples 


At a kernel entry point, the OS uses SwapGS to obtain a pointer to kernel data structures and 
simultaneously save the user's GS base. Upon exit, it uses SwapGS to restore the user's GS base: 


SystemCallEntryPoint: 

SwapGSs ; get kernel pointer, save user GSbase 
mov gs:[SavedUserRSP], rsp save user's stack pointer 

mov rsp, gs:[KernelStackPtr] set up kernel stack 


Ne oNe Ne ONS 


push rax now save user GPRs on kernel stack 
perform system service 

SwapGSs ; restore user GS, save kernel pointer 

Mnemonic Opcode Description 

SWAPGS OF 01 F8 Exchange GS base with KernelGSBase MSR. 


(Invalid in legacy and compatibility modes.) 
Related Instructions 
None 


rFLAGS Affected 


None 
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Exceptions 


Exception 


Invalid opcode, #UD 


Real 


Virtual 
8086 


Xx 


Protected 


24594—Rev. 3.33—November 2021 


Cause of Exception 


This instruction was executed in legacy or 
compatibility mode. 


General protection, #GP 


CPL was not 0. 
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SYSCALL Fast System Call 


Transfers control to a fixed entry point in an operating system. It is designed for use by system and 
application software implementing a flat-segment memory model. 


The SYSCALL and SYSRET instructions are low-latency system call and return control-transfer 
instructions, which assume that the operating system implements a flat-segment memory model. By 
eliminating unneeded checks, and by loading pre-determined values into the CS and SS segment 
registers (both visible and hidden portions), calls to and returns from the operating system are greatly 
simplified. These instructions can be used in protected mode and are particularly well-suited for use in 
64-bit mode, which requires implementation of a paged, flat-segment memory model. 


This instruction has been optimized by reducing the number of checks and memory references that are 
normally made so that a call or return takes considerably fewer clock cycles than the CALL FAR /RET 
FAR instruction method. 


It is assumed that the base, limit, and attributes of the Code Segment will remain flat for all processes 
and for the operating system, and that only the current privilege level for the selector of the calling 
process should be changed from a current privilege level of 3 to a new privilege level of 0. It is also 
assumed (but not checked) that the RPL of the SYSCALL and SYSRET target selectors are set to 0 
and 3, respectively. 


SYSCALL sets the CPL to 0, regardless of the values of bits 33:32 of the STAR register. There are no 
permission checks based on the CPL, real mode, or virtual-8086 mode. SYSCALL and SYSRET must 
be enabled by setting EFER.SCE to 1. 


It is the responsibility of the operating system to keep the descriptors in memory that correspond to the 
CS and SS selectors loaded by the SYSCALL and SYSRET instructions consistent with the segment 
base, limit, and attribute values forced by these instructions. 


Legacy x86 Mode. In legacy x86 mode, when SYSCALL is executed, the EIP of the instruction 
following the SYSCALL is copied into the ECX register. Bits 31:0 of the SYSCALL/SYSRET target 
address register (STAR) are copied into the EJP register. (The STAR register is model-specific register 
C000_0081h.) 


New selectors are loaded, without permission checking (see above), as follows: 

¢ Bits 47:32 of the STAR register specify the selector that is copied into the CS register. 

¢ Bits 47:32 of the STAR register + 8 specify the selector that is copied into the SS register. 
¢ The CS_base and the SS_base are both forced to zero. 

¢ The CS_limit and the SS_limit are both forced to 4 Gbyte. 

¢ The CS segment attributes are set to execute/read 32-bit code with a CPL of zero. 


¢ The SS segment attributes are set to read/write and expand-up with a 32-bit stack referenced by 
ESP. 
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Long Mode. When long mode is activated, the behavior of the SYSCALL instruction depends on 
whether the calling software is in 64-bit mode or compatibility mode. In 64-bit mode, SYSCALL 
saves the RIP of the instruction following the SYSCALL into RCX and loads the new RIP from 
LSTAR bits 63:0. (The LSTAR register is model-specific register CO00_0082h.) In compatibility 
mode, SYSCALL saves the RIP of the instruction following the SYSCALL into RCX and loads the 
new RIP from CSTAR bits 63:0. (The CSTAR register is model-specific register CO00_0083h.) 


New selectors are loaded, without permission checking (see above), as follows: 

¢ Bits 47:32 of the STAR register specify the selector that is copied into the CS register. 

¢ Bits 47:32 of the STAR register + 8 specify the selector that is copied into the SS register. 

¢ The CS_base and the SS_base are both forced to zero. 

e¢ The CS_limit and the SS_limit are both forced to 4 Gbyte. 

¢ The CS segment attributes are set to execute/read 64-bit code with a CPL of zero. 

¢ The SS segment attributes are set to read/write and expand-up with a 64-bit stack referenced by 
RSP. 


The WRMSR instruction loads the target RIP into the LSTAR and CSTAR registers. If an RIP written 
by WRMSR is not in canonical form, a general-protection exception (#GP) occurs. 


How SYSCALL and SYSRET handle rFLAGS, depends on the processor’s operating mode. 


In legacy mode, SYSCALL treats EFLAGS as follows: 
e EFLAGS.IF is cleared to 0. 

¢ EFLAGS.RF is cleared to 0. 

e EFLAGS.VM is cleared to 0. 


In long mode, SYSCALL treats RFLAGS as follows: 


e The current value of RFLAGS is saved in R11. 
e RFLAGS is masked using the value stored in SYSCALL_FLAG_MASK. 
¢ RFLAGS.RF is cleared to 0. 


For further details on the SYSCALL and SYSRET instructions and their associated MSR registers 
(STAR, LSTAR, CSTAR, and SYSCALL_FLAG_MASK), see “Fast System Call and Return” in 
Volume 2. 


Support for the SYSCALL instruction is indicated by CPUID Fn8000_0001_EDX[SysCallSysRet] = 
1. For more information on using the CPUID instruction, see the description of the CPUID instruction 
on page 165. 
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Instruction Encoding 


Mnemonic Opcode Description 
SYSCALL OF 05 Call operating system. 
Action 


// See “Pseudocode Definition” on page 57. 
SYSCALL_START: 


IF (MSR_EFER.SCE == 0) // Check if syscall/sysret are enabled. 
EXCEPTION [#UD] 


IF (LONG_MODE) 
SYSCALL_LONG_MODE 

ELSE // (LEGACY_MODE) 
SYSCALL_LEGACY_MODE 


SYSCALL_LONG_MODE: 


RCX.q 
R11.q 


next_RIP 
RFLAGS // with rf cleared 


IF (64BIT_MODE) 
temp_RIP.q = MSR_LSTAR 
ELSE // (COMPATIBILITY_MODE) 
temp_RIP.q = MSR_CSTAR 


cs.sel = MSR_STAR.SYSCALL_CS AND OXFFFC 

CS.attr = 64-bit code,dplO // Always switch to 64-bit mode in long mode. 
CS.base = 0x00000000 

CS.limit = OXFFFFFFFF 


SS.sel = MSR_STAR.SYSCALL_CS + 8 
SS.attr = 64-bit stack,dplo 
SS.base = 0x00000000 

SS.limit = OxXFFFFFFFF 


RFLAGS = RFLAGS AND ~MSR_SFMASK 
RFLAGS.RF = 0 


IF (ShadowStacksEnabled at current CPL) 
PL3_SSP = SSP 


CPL = 0 


IF (ShadowStacksEnabled at current CPL) 
SSP = 0 
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RIP = temp_RIP 

EXIT 
SYSCALL_LEGACY_MODE: 

RCX.d = next_RIP 


temp_RIP.d = MSR_STAR.EIP 


cs.sel = MSR_STAR.SYSCALL_CS AND OXFFFC 

CS.attr = 32-bit code,dpl0 // Always switch to 32-bit mode in legacy mode. 
CS.base = 0x00000000 

CS.limit = OXFFFFFFFF 

SS.sel = MSR_STAR.SYSCALL_CS + 8 

SS.attr = 32-bit stack,dplo 

SS.base = 0x00000000 

SS.limit = OXFFFFFFFF 


RFLAGS.VM, IF, RF=0 
CPL = 0 


RIP = temp_RIP 
EXIT 


Related Instructions 


SYSRET, SYSENTER, SYSEXIT 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M M M M 0 0 M M M M M M M M M M M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to one or cleared to zero is M (modified). Unaffected flags 
are blank. Undefined flags are U. 


Exceptions 
Virtual 
Exception Real | 8086 /| Protected Cause of Exception 
The SYSCALL and SYSRET instructions are not 
Xx Xx Xx supported, as indicated by CPUID 


Fn8000_0001_EDX[SysCallSysRet] = 0. 


The system call extension bit (SCE) of the extended 
Xx Xx Xx feature enable register (EFER) is set to 0. (The 
EFER register is MSR CO00_0080h.) 


Invalid opcode, #UD 
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SYSENTER System Call 


Transfers control to a fixed entry point in an operating system. It is designed for use by system and 
application software implementing a flat-segment memory model. This instruction is valid only in 
legacy mode. 


Three model-specific registers (MSRs) are used to specify the target address and stack pointers for the 
SYSENTER instruction, as well as the CS and SS selectors of the called and returned procedures: 


e MSR_SYSENTER_CS: Contains the CS selector of the called procedure. The SS selector is set to 
MSR_SYSENTER_CS + 8. 


¢ MSR_SYSENTER_ESP: Contains the called procedure’s stack pointer. 

¢ MSR_SYSENTER_EIP: Contains the offset into the CS of the called procedure. 

The hidden portions of the CS and SS segment registers are not loaded from the descriptor table as 

they would be using a legacy x86 CALL instruction. Instead, the hidden portions are forced by the 

processor to the following values: 

¢ The CS and SS base values are forced to 0. 

e The CS and SS limit values are forced to 4 Gbytes. 

¢ The CS segment attributes are set to execute/read 32-bit code with a CPL of zero. 

¢ The SS segment attributes are set to read/write and expand-up with a 32-bit stack referenced by 
ESP. 


System software must create corresponding descriptor-table entries referenced by the new CS and SS 
selectors that match the values described above. 


The return EJP and application stack are not saved by this instruction. System software must explicitly 
save that information. 


An invalid-opcode exception occurs if this instruction is used in long mode. Software should use the 
SYSCALL (and SYSRET) instructions in long mode. If SYSENTER is used in real mode, a #GP is 
raised. 


For additional information on this instruction, see “SYSENTER and SYSEXIT (Legacy Mode Only)” 
in Volume 2. 


Support for the SYSENTER instruction is indicated by CPUID Fn0000_0001_EDX[SysEnterSysExit] 
= 1. For more information on using the CPUID instruction, see the description of the CPUID 
instruction on page 165. 


Instruction Encoding 


Mnemonic Opcode Description 
SYSENTER OF 34 Call operating system. 
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Related Instructions 


SYSCALL, SYSEXIT, SYSRET 
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rFLAGS Affected 

ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
) 0) 

21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 | 9 8 | 7 6 | 4 | 2 |] 0 


Undefined flags are U. 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to one or zero is M (modified). Unaffected flags are blank. 


Exceptions 
Virtual 
Exception Real | 8086 /| Protected Cause of Exception 
The SYSENTER and SYSEXIT instructions are not 
Xx Xx xX supported, as indicated by 
Invalid opcode, #UD CPUID Fn0000_0001_EDx[SysEnterSysExit] = 0. 
Xx This instruction is not recognized in long mode. 

: x This instruction is not recognized in real mode. 

General protection, #GP 
x x MSR_SYSENTER_CS was a null selector. 
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SYSEXIT System Return 


Returns from the operating system to an application. It is a low-latency system return instruction 
designed for use by system and application software implementing a flat-segment memory model. 


This is a privileged instruction. The current privilege level must be zero to execute this instruction. An 
invalid-opcode exception occurs if this instruction is used in long mode. Software should use the 
SYSRET (and SYSCALL) instructions when running in long mode. 


When a system procedure performs a SYSEXIT back to application software, the CS selector is 
updated to point to the second descriptor entry after the SYSENTER CS value (MSR 
SYSENTER_CS+16). The SS selector is updated to point to the third descriptor entry after the 
SYSENTER CS value (MSR SYSENTER_CS+24). The CPL is forced to 3, as are the descriptor 
privilege levels. 


The hidden portions of the CS and SS segment registers are not loaded from the descriptor table as 
they would be using a legacy x86 RET instruction. Instead, the hidden portions are forced by the 
processor to the following values: 


¢ The CS and SS base values are forced to 0. 

e The CS and SS limit values are forced to 4 Gbytes. 

e The CS segment attributes are set to 32-bit read/execute at CPL 3. 

¢ The SS segment attributes are set to read/write and expand-up with a 32-bit stack referenced by 
ESP. 


System software must create corresponding descriptor-table entries referenced by the new CS and SS 
selectors that match the values described above. 

The following additional actions result from executing SYSEXIT: 

¢ EIPis loaded from EDX. 

¢ ESP is loaded from ECX. 


System software must explicitly load the return address and application software-stack pointer into the 
EDX and ECX registers prior to executing SYSEXIT. 


For additional information on this instruction, see “SYSENTER and SYSEXIT (Legacy Mode Only)” 
in Volume 2. 


Support for the SYSEXIT instruction is indicated by CPUID Fn0000_0001_EDX[SysEnterSysExit] = 
1. For more information on using the CPUID instruction, see the description of the CPUID instruction 
on page 165. 
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Instruction Encoding 


Mnemonic 


SYSEXIT 


Related Instructions 


Opcode 
OF 35 


SYSCALL, SYSENTER, SYSRET 
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Description 


Return from operating system to application. 


rFLAGS Affected 
ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
0 
21 | 20 | 19 | 18 ; 17 | 16 | 14 13:12 11 | 10 9 8 it 6 4 2 0 
Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to one or cleared to zero is M (modified). Unaffected flags are 
blank. 
Exceptions 
Virtual 
Exception Real | 8086 | Protected Cause of Exception 
The SYSENTER and SYSEXIT instructions are not 
Xx Xx Xx supported, as indicated by 
Invalid opcode, #UD CPUID Fn0000_0001_EDx[SysEnterSysExit] = 0. 
Xx This instruction is not recognized in long mode. 
Xx Xx This instruction is only recognized in protected 
mode. 
General protection, #GP X CPi was nol: 
x MSR_SYSENTER_CS was a null selector. 
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SYSRET Fast System Return 


Returns from the operating system to an application. It is a low-latency system return instruction 
designed for use by system and application software implementing a flat segmentation memory model. 


The SYSCALL and SYSRET instructions are low-latency system call and return control-transfer 
instructions that assume that the operating system implements a flat-segment memory model. By 
eliminating unneeded checks, and by loading pre-determined values into the CS and SS segment 
registers (both visible and hidden portions), calls to and returns from the operating system are greatly 
simplified. These instructions can be used in protected mode and are particularly well-suited for use in 
64-bit mode, which requires implementation of a paged, flat-segment memory model. 


This instruction has been optimized by reducing the number of checks and memory references that are 
normally made so that a call or return takes substantially fewer internal clock cycles when compared to 
the CALL/RET instruction method. 


It is assumed that the base, limit, and attributes of the Code Segment will remain flat for all processes 
and for the operating system, and that only the current privilege level for the selector of the calling 
process should be changed from a current privilege level of 0 to a new privilege level of 3. It is also 
assumed (but not checked) that the RPL of the SYSCALL and SYSRET target selectors are set to 0 
and 3, respectively. 


SYSRET sets the CPL to 3, regardless of the values of bits 49:48 of the star register. SYSRET can only 
be executed in protected mode at CPL 0. SYSCALL and SYSRET must be enabled by setting 
EFER.SCE to 1. 


It is the responsibility of the operating system to keep the descriptors in memory that correspond to the 
CS and SS selectors loaded by the SYSCALL and SYSRET instructions consistent with the segment 
base, limit, and attribute values forced by these instructions. 


When a system procedure performs a SYSRET back to application software, the CS selector is 
updated from bits 63:50 of the STAR register (STAR.SYSRET_CS) as follows: 


¢ If the return is to 32-bit mode (legacy or compatibility), CS is updated with the value of 
STAR.SYSRET_CS. 

e If the return is to 64-bit mode, CS is updated with the value of STAR.SYSRET_CS + 16. 

In both cases, the CPL is forced to 3, effectively ignoring STAR bits 49:48. The SS selector is updated 


to point to the next descriptor-table entry after the CS descriptor (STAR.SYSRET_CS + 8), and its 
RPL is not forced to 3. 


The hidden portions of the CS and SS segment registers are not loaded from the descriptor table as 
they would be using a legacy x86 RET instruction. Instead, the hidden portions are forced by the 
processor to the following values: 

e The CS base value is forced to 0. 


e The CS limit value is forced to 4 Gbytes. 
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e The CS segment attributes are set to execute-read 32 bits or 64 bits (see below). 

e The SS segment base, limit, and attributes are not modified. 

When SYSCALLed system software is running in 64-bit mode, it has been entered from either 64-bit 
mode or compatibility mode. The corresponding SYSRET needs to know the mode to which it must 
return. Executing SYSRET in non-64-bit mode or with a 16- or 32-bit operand size returns to 32-bit 


mode with a 32-bit stack pointer. Executing SYSRET in 64-bit mode with a 64-bit operand size returns 
to 64-bit mode with a 64-bit stack pointer. 


The instruction pointer is updated with the return address based on the operating mode in which 
SYSRET is executed: 


¢ If returning to 64-bit mode, SYSRET loads RIP with the value of RCX. 
e If returning to 32-bit mode, SYSRET loads EIP with the value of ECX. 


How SYSRET handles RFLAGS depends on the processor’s operating mode: 


e If executed in 64-bit mode, SYSRET loads the lower-32 RFLAGS bits from R11[31:0] and clears 
the upper 32 RFLAGS bits. 


¢ If executed in legacy mode or compatibility mode, SYSRET sets EFLAGS.IF. 


For further details on the SYSCALL and SYSRET instructions and their associated MSR registers 
(STAR, LSTAR, and CSTAR), see “Fast System Call and Return” in Volume 2. 


Support for the SYSRET instruction is indicated by CPUID Fn8000_0001_EDX[SysCallSysRet] = 1. 
For more information on using the CPUID instruction, see the description of the CPUID instruction on 
page 165. 


Instruction Encoding 


Mnemonic Opcode Description 
SYSRET OF 07 Return from operating system. 
Action 


// See “Pseudocode Definition” on page 57. 
SYSRET_START: 
IF (MSR_EFER.SCE == 0) // Check if syscall/sysret are enabled. 
EXCEPTION [#UD] 
IF ((!PROTECTED_MODE) || (CPL != @)) 
EXCEPTION [#GP(0)] // SYSRET requires protected mode, cplo 
IF (64BIT_MODE) 


SYSRET_64BIT_MODE 
ELSE // (!64BIT_MODE) 
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SYSRET_NON_64BIT_MODE 
SYSRET_64BIT_MODE: 


IF (OPERAND_SIZE == 64) // Return to 64-bit mode. 
i 

CS.sel = (MSR_STAR.SYSRET_CS + 16) OR 3 

CS.base = 0x00000000 

cS.limit = OxXFFFFFFFF 

CS.attr = 64-bit code,dpl3 


temp_RIP.q = RCX 


I 
ELSE // Return to 32-bit compatibility mode. 
< 

cs.sel = MSR_STAR.SYSRET_CS OR 3 

CS.base = 0x00000000 

CS.limit = OxFFFFFFFF 

CS.attr = 32-bit code,dpl3 

temp_RIP.d = RCX 
} 
SS.sel = MSR_STAR.SYSRET_CS + 8 // SS selector is changed, 

// SS base, limit, attributes unchanged. 

RFLAGS.q = R11 // RF=0,VM=0 
CPL = 3 


IF (ShadowStacksEnabled at current CPL) 
SSP = PL3_SSP 


RIP = temp_RIP 
EXIT 


SYSRET_NON_64BIT_MODE: 


cs.sel = MSR_STAR.SYSRET_CS OR 3 // Return to 32-bit legacy protected mode. 
CS.base = 0x00000000 

CS.limit = OxXFFFFFFFF 

CS.attr = 32-bit code,dp13 


temp_RIP.d = RCX 


SS.sel = MSR_STAR.SYSRET_CS + 8 // SS selector is changed. 

// SS base, limit, attributes unchanged. 
RFLAGS.IF = 1 
CPL = 3 


IF (ShadowStacksEnabled at current CPL) 
SSP = PL3_SSP 
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RIP = temp_RIP 
EXIT 


Related Instructions 


SYSCALL, SYSENTER, SYSEXIT 


rFLAGS Affected 

ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 
M;|M|M/]M Oo | M M Mi}|M/M{]M{/|M{]M{/]M {MIM 
21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 | 9 8 | 7 6 | 4 | 2 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to one or cleared to zero is M (modified). Unaffected flags 
are blank. Undefined flags are U. 


Exceptions 
Virtual 
Exception Real | 8086 /| Protected Cause of Exception 
The SYSCALL and SYSRET instructions are not 
Xx Xx Xx supported, as indicated by CPUID 
Fn8000_0001_EDX[SysCallSysRet] = 0. 
Invalid opcode, #UD - - 
The system call extension bit (SCE) of the extended 
Xx Xx Xx feature enable register (EFER) is set to 0. (The 
EFER register is MSR CO00_0080h.) 
This instruction is only recognized in protected 
X X a 
General protection, #GP MeOs 
xX CPL was not 0. 
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TLBSYNC Synchronize TLB Invalidations 


TLBSYNC acts as a synchronizing instruction to ensure that all logical processors in a system have 
responded to an INVLPGB previously executed by the current logical processor. Upon execution of an 
INVLPGB, the processor does not wait for confirmation that the other processors have performed the 
specified TLB invalidation. A TLBSYNC is therefore required before software can move forward 
with the knowledge that all requested invalidations have been completed in the system. A TLBSYNC 
also ensures that memory instructions using the translations invalidated by those prior INVLPGB 
instructions have retired and writes using the translations have drained from the write combining 
buffers. 


The TLBSYNC instruction is weakly ordered with respect to data and instruction prefetches. 

The TLBSYNC instruction is strongly ordered with respect to surrounding loads and stores. 
TLBSYNC is a serializing instruction and is privileged. It can only be executed at CPL 0. TLBSYNC 
is only supported in guests if enabled by hypervisor in the VMCB. 


Mnemonic Opcode Description 


TLBSYNC OF 01 FF Synchronize broadcasted TLB Invalidations 


Related Instructions 


INVLPGB 

rFLAGS Affected 

None 

Exceptions 

Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Xx Xx Xx Instruction not supported as indicated by CPUID 

Fn8000_0008_EBX[INVLPGB] = 0 

ee opcode, Xx x Instruction is only recognized in protected mode 

Xx The hypervisor has not enabled Guest usage of this 

instruction. 

General protection, 

#GP xX CPL was not 0 
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VERR Verify Segment for Reads 


Verifies whether a code or data segment specified by the segment selector in the 16-bit register or 
memory operand is readable from the current privilege level. The zero flag (ZF) is set to 1 if the 
specified segment is readable. Otherwise, ZF is cleared. 


A segment is readable if all of the following apply: 

¢ the selector is not a null selector. 

¢ the descriptor is within the GDT or LDT limit. 

¢ the segment is a data segment or readable code segment. 

e the descriptor DPL is greater than or equal to both the CPL and RPL, or the segment is a 
conforming code segment. 


The processor does not recognize the VERR instruction in real or virtual-8086 mode. 


Mnemonic Opcode Description 
VERR reg/mem16 OF 00 /4 Set the zero flag (ZF) to 1 if the segment 


selected can be read. 


Related Instructions 


ARPL, LAR, LSL, VERW 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 


M 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to one or cleared to zero is M (modified). Unaffected flags are 
blank. Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
ee opcode, Xx Xx This instruction is only recognized in protected mode. 
A memory address exceeded the stack segment limit or is 
Stack, #SS ‘ non-canonical. 
Xx Amemory address exceeded a data segment limit or was non- 
General protection, canonical. 
#GP 
x A null data segment was used to reference memory. 
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Virtual 
Exception Real| 8086 |Protected Cause of Exception 
Page fault, #PF x A page fault resulted from the execution of the instruction. 
Alignment check, Xx An unaligned memory reference was performed while 
#AC alignment checking was enabled. 
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VERW Verify Segment for Write 


Verifies whether a data segment specified by the segment selector in the 16-bit register or memory 
operand is writable from the current privilege level. The zero flag (ZF) is set to 1 if the specified 
segment is writable. Otherwise, ZF is cleared. 


A segment is writable if all of the following apply: 


¢ the selector is not a null selector. 

¢ the descriptor is within the GDT or LDT limit. 

e the segment is a writable data segment. 

e the descriptor DPL is greater than or equal to both the CPL and RPL. 


The processor does not recognize the VERW instruction in real or virtual-8086 mode. 


Mnemonic Opcode Description 


Set the zero flag (ZF) to 1 if the segment 


VERW reg/mem16 OF 00 /5 selected can be written. 


Related Instructions 


ARPL, LAR, LSL, VERR 


rFLAGS Affected 


ID | VIP | VIF | AC | VM | RF | NT IOPL OF | DF | IF | TF | SF | ZF | AF | PF | CF 


21 | 20 | 19 | 18 | 17 | 16 | 14 13:12 11 | 10 9 8 7 6 4 2 0 


Note: Bits 31:22, 15, 5, 3, and 1 are reserved. A flag set to one or cleared to zero is M (modified). Unaffected flags are 
blank. Undefined flags are U. 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 
ee opcode, Xx Xx This instruction is only recognized in protected mode. 
Stack, #SS Xx A memory address exceeded the stack segment limit or was 


non-canonical. 


Amemory address exceeded a data segment limit or was non- 
General protection, canonical. 


#GP 


A null data segment was used to access memory. 


Page fault, #PF 


Alignment check, 
#AC 


A page fault resulted from the execution of the instruction. 


x | KIKI) xX 


An unaligned memory reference was performed while 
alignment checking was enabled. 
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VMLOAD Load State from VMCB 


Loads a subset of processor state from the VMCB specified by the system-physical address in the rAX 
register. The portion of RAX used to form the address is determined by the effective address size. 


The VMSAVE and VMLOAD instructions complement the state save/restore abilities of VMRUN and 
#VMEXIT, providing access to hidden state that software is otherwise unable to access, plus some 
additional commonly-used state. 


This is a Secure Virtual Machine (SVM) instruction. Support for the SVM architecture and the SVM 
instructions is indicated by CPUID Fn8000_0001_ECX[SVM] = 1. For more information on using the 
CPUID instruction, see the reference page for the CPUID instruction on page 165. 


This instruction generates a #UD exception if SVM is not enabled. See “Enabling SVM” in AMD64 
Architecture Programmer’s Manual Volume 2: System Instructions, order# 24593. 


Mnemonic Opcode Description 
VMLOAD rAX OF 01 DA Load additional state from VMCB. 
Action 
IF ((MSR_EFER.SVME == ©) || (!PROTECTED_MODE) ) 
EXCEPTION [#UD] // This instruction can only be executed in protected 


// mode with SVM enabled 


IF (CPL != 0) // This instruction is only allowed at CPL 0 
EXCEPTION [#GP] 


IF (rAX contains an unsupported system-physical address) 
EXCEPTION [#GP] 


Load from a VMCB at system-physical address rAx: 
FS, GS, TR, LDTR (including all hidden state) 
KernelGsBase 


STAR, LSTAR, CSTAR, SFMASK 
SYSENTER_CS, SYSENTER_ESP, SYSENTER_EIP 


Related Instructions 


VMSAVE 


rFLAGS Affected 


None. 
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Exceptions 
Virtual 
Exception Real} 8086 |Protected Cause of Exception 
Xx Xx Xx The SVM instructions are not supported as indicated by 
CPUID Fn8000_0001_ECX[SVM] = 0. 
Invalid opcode, #UD Xx Secure Virtual Machine was not enabled (EFER.SVME=0). 
Xx x The instruction is only recognized in protected mode. 
Xx CPL was not 0. 
General protection, Xx rAX referenced a physical address above the maximum 
#GP supported physical address. 
Xx The address in rAX was not aligned on a 4Kbyte boundary. 
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VMMCALL Call VMM 
VMGEXIT SEV-ES Exit to VMM 


VMMCALL and VMGEXIT provide a mechanism for a non-SEV-ES and an SEV-ES guest, 
respectively, to explicitly communicate with the VMM by generating a#VMEXIT. 


A non-intercepted VMMCALL unconditionally raises a #UD exception. VMGEXIT is always 
intercepted and unconditionally causes a#VMEXIT. 


VMMCALL and VMGEXIT instructions are allowed in all modes and at all privilege levels. These 
instructions generate a #UD exception if SVM is not enabled. See “Enabling SVM” in AMD64 
Architecture Programmer’s Manual Volume 2: System Instructions, order# 24593. 


VMMCALL and VMGEXIT are Secure Virtual Machine (SVM) instructions. Support for the SVM 
architecture and the SVM instructions is indicated by CPUID Fn8000_0001_ECX[SVM] = 1. Support 
for VMGEXIT instruction is indicated by CPUID Fn8000_001F_EAX[SEV-ES] = 1. The VMGEXIT 
encoding is interpreted as VMMCALL on processors that do not explicitly support VMGEXIT, 
including legacy processors, or if VMGEXIT instruction is not executed by an SEV-ES guest. For 
more information on using the CPUID instruction, see the reference page for the CPUID instruction on 
page 165. 


Mnemonic Opcode Description 
VMMCALL OF 01 D9 Explicit communication with the VMM. 
VMGEXIT F2/F3 OF 01 DO ae communication with the VMM for SEV-ES 


Related Instructions 


None. 

rFLAGS Affected 

None. 

Exceptions 

Virtual 
Exception Real} 8086 |Protected Cause of Exception 
X X X The SVM instructions are not supported as indicated by 
CPUID Fn8000_0001_ECX[SVM] = 0. 
invalic ope od ey iJD Xx Xx Xx Secure Virtual Machine was not enabled (EFER.SVME=0). 

Xx Xx x VMMCALL was not intercepted. 
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VMRUN Run Virtual Machine 


Starts execution of a guest instruction stream. The physical address of the virtual machine control 
block (VMCB) describing the guest is taken from the rAX register (the portion of RAX used to form 
the address is determined by the effective address size). The physical address of the VMCB must be 
aligned on a 4KB boundary. 


VMRUN saves a subset of host processor state to the host state-save area specified by the physical 
address in the VM_HSAVE_PA MSR. VMRUN then loads guest processor state (and control 
information) from the VMCB at the physical address specified in rAX. The processor then executes 
guest instructions until one of several intercept events (specified in the VMCB) is triggered. When an 
intercept event occurs, the processor stores a snapshot of the guest state back into the VMCB, reloads 
the host state, and continues execution of host code at the instruction following the VMRUN 
instruction. 


This is a Secure Virtual Machine (SVM) instruction. Support for the SVM architecture and the SVM 
instructions is indicated by CPUID Fn8000_0001_ECX[SVM] = 1. For more information on using the 
CPUID instruction, see the reference page for the CPUID instruction on page 165. 


This instruction generates a #UD exception if SVM is not enabled. See “Enabling SVM” in AMD64 
Architecture Programmer’s Manual Volume 2: System Instructions, order# 24593. 


The VMRUN instruction is not supported in System Management Mode. Processor behavior resulting 
from an attempt to execute this instruction from within the SMM handler is undefined. 


Instruction Encoding 


Mnemonic Opcode Description 
VMRUN rAX OF 01 D8 Performs a world-switch to guest. 
Action 
IF ((MSR_EFER.SVME == ©) || (!PROTECTED_MODE) ) 
EXCEPTION [#UD] // This instruction can only be executed in protected 


// mode with SVM enabled 


IF (CPL != 0) // This instruction is only allowed at CPL 0 
EXCEPTION [#GP] 


IF (rAX contains an unsupported physical address) 
EXCEPTION [#GP] 


IF (intercepted(VMRUN) ) 
#VMEXIT (VMRUN) 
remember VMCB address (delivered in rAX) for next #VMEXIT 
save host state to physical memory indicated in the VM_HSAVE_PA MSR: 
ES.sel 
cs.sel 
Ss.sel 
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DS.sel 

GDTR. {base, limit} 
IDTR. {base, limit} 
EFER 

CRO 

CR4 

CR3 

// host CR2 is not saved 
RFLAGS 

RIP 

RSP 

RAX 


from the VMCB at physical address rAX, load control information: 
intercept vector 
TSC_OFFSET 
interrupt control (v_irg, v_intr_*, v_tpr) 
EVENTINJ field 
ASID 


IF(nested paging supported) 
NP_ENABLE 
IF (NP_ENABLE == 1) 
nCR3 


from the VMCB at physical address rAX, load guest state: 
ES.{base, limit, attr,sel} 
cS.{base, limit,attr,sel} 
SS.{base, limit,attr,sel} 
DS.{base, limit, attr,sel} 
GDTR. {base, limit} 
IDTR. {base, limit} 
EFER 
CRO 
CR4 
CR3 
CR2 
IF (NP_ENABLE == 1) 
gPAT // Leaves host hPAT register unchanged. 
RFLAGS 
RIP 
RSP 
RAX 
DR7 
DR6 
CPL // © for real mode, 3 for v86 mode, else as loaded. 
INTERRUPT_SHADOW 


IF (LBR virtualization supported) 
LBR_VIRTUALIZATION_ENABLE 
IF (LBR_VIRTUALIZATION_ENABLE == 1) 
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save LBR state to the host save area 
DBGCTL 
BR_FROM 
BR_TO 
LASTEXCP_FROM 
LASTEXCP_TO 

load LBR state from the VMCB 
DBGCTL 
BR_FROM 
BR_TO 
LASTEXCP_FROM 
LASTEXCP_TO 


IF (guest state consistency checks fail) 
#VMEXIT (INVALID) 


Execute command stored in TLB_CONTROL. 


GIF = 1 // allow interrupts in the guest 
IF (EVENTINJ.V) 

cause exception/interrupt in guest 
else 

jump to first guest instruction 


Upon #VMEXIT, the processor performs the following actions in order to return to the host execution 
context: 


GIF = 0 
save guest state to VMCB: 
ES.{base, limit, attr, sel} 
CS.{base, limit, attr,sel} 
SS. {base, limit, attr,sel} 
DS.{base, limit, attr, sel} 
GDTR. {base, limit} 
IDTR. {base, limit} 
EFER 
CR4 
CR3 
CR2 
CRO 
if (nested paging enabled) 
gPAT 
RFLAGS 
RIP 
RSP 
RAX 
DR7 
DR6 
CPL 
INTERRUPT_SHADOW 
save additional state and intercept information: 
V_IRQ, V_TPR 
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EXITCODE 
EXITINFO1L 
EXITINFO2 
EXITINTINFO 
Clear EVENTINJ field in VMCB 


prepare for host mode by clearing internal processor state bits: 
clear intercepts 
clear v_irgq 
clear v_intr_masking 
clear tsc_offset 
disable nested paging 
clear ASID to zero 


reload host state 
GDTR. {base, limit} 
IDTR. {base, limit} 
EFER 
CRO 
CRO.PE = 1 // saved copy of CRO.PE is ignored 
CR4 
CR3 
if (host is in PAE paging mode) 
reloaded host PDPEs 
// Do not reload host CR2 or PAT 
RFLAGS 
RIP 
RSP 
RAX 
DR7 = “all disabled” 
CPL = 0 
ES.sel; reload segment descriptor from GDT 
CS.sel; reload segment descriptor from GDT 
SS.sel; reload segment descriptor from GDT 
DS.sel; reload segment descriptor from GDT 


if (LBR virtualization supported) 
LBR_VIRTUALIZATION_ENABLE 
if (LBR_VIRTUALIZATION_ENABLE == 1) 
save LBR state to the VMCB: 
DBGCTL 
BR_FROM 
BR_TO 
LASTEXCP_FROM 
LASTEXCP_TO 
load LBR state from the host save area: 
DBGCTL 
BR_FROM 
BR_TO 
LASTEXCP_FROM 
LASTEXCP_TO 
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if (illegal host state loaded, or exception while loading host state) 


shutdown 
else 


execute first host instruction following the VMRUN 


Related Instructions 


VMLOAD, VMSAVE. 


rFLAGS Affected 
None. 
Exceptions 
Virtual 
Exception Real| 8086 [Protected Cause of Exception 
Xx Xx Xx The SVM instructions are not supported as indicated by 
CPUID Fn8000_0001_ECX[SVM] = 0. 
Invalid opcode, #UD Xx Secure Virtual Machine was not enabled (EFER.SVME=0). 
Xx x The instruction is only recognized in protected mode. 
xX CPL was not 0. 
General protection, Xx rAX referenced a physical address above the maximum 
#GP supported physical address. 
Xx The address in rAX was not aligned on a 4Kbyte boundary. 
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VMSAVE Save State to VMCB 


Stores a subset of the processor state into the VMCB specified by the system-physical address in the 
rAX register (the portion of RAX used to form the address is determined by the effective address size). 


The VMSAVE and VMLOAD instructions complement the state save/restore abilities of VMRUN and 
#VMEXIT, providing access to hidden state that software is otherwise unable to access, plus some 
additional commonly-used state. 


This is a Secure Virtual Machine (SVM) instruction. Support for the SVM architecture and the SVM 
instructions is indicated by CPUID Fn8000_0001_ECX[SVM] = 1. For more information on using the 
CPUID instruction, see the reference page for the CPUID instruction on page 165. 


This instruction generates a #UD exception if SVM is not enabled. See “Enabling SVM” in AMD64 
Architecture Programmer’s Manual Volume 2: System Instructions, order# 24593. 


Instruction Encoding 


Mnemonic Opcode Description 
VMSAVE rAX OF 01 DB Save additional guest state to VMCB. 
Action 
IF ((MSR_EFER.SVME == ©) || (!PROTECTED_MODE) ) 
EXCEPTION [#UD] // This instruction can only be executed in protected 


// mode with SVM enabled 


IF (CPL != 0) // This instruction is only allowed at CPL 0 
EXCEPTION [#GP] 


IF (rAX contains an unsupported system-physical address) 
EXCEPTION [#GP] 


Store to a VMCB at system-physical address rAx: 
FS, GS, TR, LDTR (including all hidden state) 
KernelGsBase 


STAR, LSTAR, CSTAR, SFMASK 
SYSENTER_CS, SYSENTER_ESP, SYSENTER_EIP 


Related Instructions 


VMLOAD 


rFLAGS Affected 


None. 
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Exceptions 
Virtual 
Exception Real} 8086 |Protected Cause of Exception 
Xx Xx Xx The SVM instructions are not supported as indicated by 
CPUID Fn8000_0001_ECX[SVM] = 0. 
Invalid opcode, #UD Xx Secure Virtual Machine was not enabled (EFER.SVME=0). 
x Xx The instruction is only recognized in protected mode. 
X CPL was not 0. 
General protection, Xx rAX referenced a physical address above the maximum 
#GP supported physical address. 
Xx The address in rAX was not aligned on a 4Kbyte boundary. 
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WBINVD Writeback and Invalidate Caches 
WBNOINVD Writeback With No Invalidate 


WBINVD writes all modified lines in all levels of cache associated with this processor to main 
memory and invalidates the caches. This may or may not include lower level caches associated with 
another processor that shares any level of this processor's cache hierarchy. WBNOINVD does not 
invalidate the caches, instead leaving all (or most) cache lines in the cache hierarchy in non-modified 
state, but in all other respects it behaves the same as WBINVD. 


CPUID Fn8000_001D_EDX[WBINVD]_XN indicates the behavior of the operation at various levels 
of the cache hierarchy, for both WBINVD and WBNOINVD, with respect to lower branches in the 
cache hierarchy. If the feature bit is 0, the instruction causes the write back and (for WBINVD) 
invalidation of all lower level caches of other processors sharing the designated level of cache. If the 
feature bit is 1, the instruction does not necessarily cause the write back and invalidation of all lower 
level caches of other processors sharing the designated level of cache. See Appendix E, “Obtaining 
Processor Information Via the CPUID Instruction,” on page 593 for more information on using the 
CPUID function. 


The INVD instruction can be used when cache coherence with memory is not important. 
These instructions do not invalidate TLB caches. 


These are privileged instructions. The current privilege level of a procedure invalidating the 
processor’s internal caches must be zero. 


WBINVD and WBNOINVD are serializing instructions 


Support for WBNOINVD is indicated by CPUID Fn8000_0008_EBX[WBNOINVD] = 1. However, 
the encoding of WBNOINVD results in it being interpreted as WBINVD on processors that do not 
explicitly support WBNOINVD, including legacy processors. For more information on using the 
CPUID instruction, see the description of the CPUID instruction on page 165. 


On some processor implementations, WBINVD and WBNOINVD can be made interruptible by 
setting EFER.INTWB to 1. When this bit is set, the processor periodically checks for all types of 
interrupts (SMI, INTR, NMI, etc.) while flushing the caches. If an interrupt is observed, the processor 
stops flushing the caches, saves the instruction pointer and transfers control to the interrupt handler. 
Upon returning (via an IRET), the processor restarts the flush process from the beginning as lines will 
have been modified and cached while executing the interrupt handler. Support for setting 
EFER.INTWB is indicated by CPUID Fn8008_0008_EBX[INT_WBINVD] (bit 13) = 1. 


Mnemonic Opcode Description 


Write modified cache lines to main memory, invalidate 
WEBI OPO? internal caches, and trigger external cache flushes. 


Write modified cache lines to main memory and trigger 
MENEINVE pote es external cache flushes. 
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Related Instructions 


CLFLUSH, CLWB, INVD 


rFLAGS Affected 


None 


Exceptions 


Virtual 
8086 |Protected Cause of Exception 


rea protection, CPL was not 0. 
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WRMSR Write to Model-Specific Register 


Writes data to 64-bit model-specific registers (MSRs). These registers are widely used in 
performance-monitoring and debugging applications, as well as testability and program execution 
tracing. 


This instruction writes the contents of the EDX:EAX register pair into a 64-bit model-specific register 
specified in the ECX register. The 32 bits in the EDX register are mapped into the high-order bits of 
the model-specific register and the 32 bits in EAX form the low-order 32 bits. 


This instruction must be executed at a privilege level of 0 or a general protection fault #GP(0) will be 
raised. This exception is also generated if an attempt is made to specify a reserved or unimplemented 
model-specific register in ECX. 


WRMSR is a Serializing instruction for most MSRs, however some x2APIC and AVIC MSRs may 
have relaxed serialization semantics. See the APIC and AVIC sections in volume 2 for details. 


Support for the WRMSR instruction is indicated by CPUID Fn0000_0001_EDX[MSR] = 1 OR 
CPUID Fn8000_0001_EDX[MSR] = 1. For more information on using the CPUID instruction, see the 
description of the CPUID instruction on page 165. 


The CPUID instruction can provide model information useful in determining the existence of a 
particular MSR. 


See “Model-Specific Registers (MSRs)” in Volume 2: System Programming, for more information 
about model-specific registers, machine check architecture, performance monitoring and debug 
registers. 


Mnemonic Opcode Description 
WRMSR OF 30 Write EDX:EAX to the MSR specified by ECX. 


Related Instructions 


RDMSR 


rFLAGS Affected 


None 
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Virtual 
Exception Real| 8086 |Protected Cause of Exception 
The WRMSR instruction is not supported, as indicated by 
ia eopenes: x | x xX | CPUID Fn0000_0001_EDX[MSR] = 0 OR CPUID 
Fn8000_0001_EDX[MSR] = 0. 
xX Xx CPL was not 0. 
Xx Xx The value in ECX specifies a reserved or unimplemented 
General protection, MSR address. 
#GP Xx Xx Writing 1 to any bit that must be zero (MBZ) in the MSR. 
Xx Xx Writing a non-canonical value to a MSR that can only be 


written with canonical values. 
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WRPKRU Write Protection Key Rights 


Writes the contents of the 32-bit Protection Key Rights (PKRU) register with the value in EAX. This 
instruction forces strong memory ordering between load and store instructions preceding the 
WRPKRIU, and load and store instructions that follow the WRPKRU. 


This instruction must be executed with ECX=0 and EDX=0, otherwise a general protection fault 
(#GP) is generated. The upper 32 bits of RCX and RDX are ignored. The WRPKRU instruction 
ignores operand size overrides. 


Memory protection keys must be enabled (CR4.PKE=1), otherwise executing this instruction 
generates an invalid opcode fault (#UD). 


Software can check that system software has enabled memory protection keys (CR4.PKE=1) by 
testing CPUID Function 0000_0007h_ECX[OSPKE]. (See Section 5, “Protection Key Rights for 
User Pages” in AMD64 Architecture Programmer’s Manual Volume 2 for more information on 
memory protection keys.) 


WRPKRU can be executed at any privilege level. 


Mnemonic Opcode Description 
WRPKRU OF 01 EF Write the value in EAX to the PKRU MSR 


Related Instructions 


RDPKRU 


rFLAGS Affected 


None 


Exceptions 
Virtual 
Exception Real| 8086 |Protected Cause of Exception 


Invalid opcode, 
#UD X X X 


General protection, Xx 
#GP 


CR4.PKE=0 


ECX was not zero or EDX was not zero 
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WRSS Write to Shadow Stack 


Writes 4 or 8 bytes from the source register operand to the specified address in a shadow stack page. 
The operand size is 8 bytes in 64-bit mode (when REX.W set to 1) and 4 bytes in all other cases. 


If shadow stacks are not enabled at the current privilege level, or if WRSS is not enabled at the current 
privilege level a #UD exception is generated. 


Mnemonic Opcode Description 
WRSS mem32, reg32 66 0F 38 F6  ~=Write 4 bytes to shadow stack at mem32 
WRSSQ mem64, reg64 660F 38F6 Write 8 bytes to shadow stack at mem64 
Action 


// see "Pseudocode Definition" on page 57 
IF (CPL == 3) 


IF ((CR4.CET && U_CET.SH_STK_EN) == 0) 
EXCEPTION [#UD] 
IF (U_CET.WR_SSTK_EN == 0) 
EXCEPTION [#UD] // WRSS not enabled in U_CET 


ELSE // CPL <3 
IF ((CR4.CET && S_CET.SH_STK_EN) == 0) 


EXCEPTION [#UD] 
IF (S_CET.WR_SSTK_EN == 0) 


EXCEPTION [#UD] // WRSS not enabled in S_CET 
} 
IF (OPERAND_SIZE == 64) 
{ 


temp_LinAdr = Linear_Address(mem64) 
IF (temp_LinAdr is 8-byte aligned) 
SSTK_WRITE_MEM.q[temp_LinAdr] = reg64[63:0] // write reg64 
// to shadow stack 
ELSE 
EXCEPTION [#GP(0)] 


i 
ELSE 


temp_LinAdr = Linear_Address(mem32 ) 
IF (tmp_LinAdr is 4-byte aligned) 
SSTK_WRITE_MEM.d[temp_LinAdr] = reg32[31:0] // write reg32 
// to shadow stack 
ELSE 
EXCEPTION [#GP(0) ] 


} 
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EXIT 


Related Instructions 


WRUSS 


rFLAGS Affected 


None 


System Instruction Reference AM D Publi c Us e] 499 


AMDd¢1 


AMD64 Technology 


Exceptions 


24594—Rev. 3.33—November 2021 


Exception 


Invalid opcode, #UD 


Real 


Virtual 
8086 


Xx 


Protected 
Xx 


Cause of Exception 
Instruction is only recognized in protected mode. 


CR4.CET = 0. 


Shadow stacks are not enabled at the current privilege 
level. 


If CPL == 3 and U_CET.WR_SHSTK_EN = 0. 


If CPL !=3 and S_CET.WR_SHSTK_EN = 0. 


If mod=11b (register destination was specified). 


General protection, 
#GP 


Address not 8-byte aligned for 64-bit operand size. 


Address not 4-byte aligned for 32-bit operand size. 


A memory address exceeded a data segment limit. 


In long mode, the address of the memory operand was non- 
canonical. 


A null data segment was used to reference memory. 


A non-writeable data segment was used. 


«x | KL XK] XK | OK) KL KY] OK) KY) KY) OKO} CX 


An execute-only code segment was used to reference 
memory. 


Stack, #SS 


x< 


A memory address exceeded the stack segment limit or 
was non-canonical. 


Page fault, #PF 


A page fault resulted from the execution of the instruction. 


The destination was not a shadow stack page. 
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WRUSS Write to User Shadow Stack 


Writes 4 or 8 bytes from the source register operand to the specified address in a user shadow stack 
page. The write is performed with user-mode shadow stack semantics. The operand size is 8 bytes in 
64-bit mode (when REX.W set to 1) and 4 bytes in all other cases. 


The destination must be a user shadow stack page, otherwise a #PF exception is generated. WRUSS is 
a privileged instruction and must be executed with CPL=0, otherwise a #GP exception is generated. 


Mnemonic Opcode Description 
WRUSSD mem32, reg32 660F 38F5  +Write 4 bytes to user shadow stack 
WRUSSQ mem64, reg64 660F 38F5 Write 8 bytes to user shadow stack 
Action 


// see "Pseudocode Definition" on page 57 


IF (CR4.CET == 0) 
EXCEPTION [#UD] 

IF (CPL != 0) 
EXCEPTION [#GP(0)] 


IF (OPERAND_SIZE == 64) 
{ 
temp_LinAdr = Linear_Address(mem64 ) 
IF (temp_LinAdr is 8-byte aligned) 
SSTK_WRITE_MEM.q[tmp_LinAdr] = reg64[63:0] // write as user access 
ELSE 
EXCEPTION [#GP(0)] 


t 
ELSE 


temp_LinAdr = Linear_Address(mem32 ) 
IF (tmp_LinAdr is 4-byte aligned) 
SSTK_WRITE_MEM.d[temp_LinAdr] = reg32[31:0] // write as user access 


ELSE 
EXCEPTION [#GP(0)] 


} 


EXIT 


Related Instructions 


WRSS 


rFLAGS Affected 


None 
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Exception 


Invalid opcode, #UD 


Real 


Virtual 
8086 


Xx 


Protected 
Xx 


Cause of Exception 
Instruction is only recognized in protected mode. 


CR4.CET = 0. 


If mod=11b (register destination was specified). 


General protection, 
#GP 


If CPL!=0. 


Address not 8-byte aligned for 64-bit operand size. 


Address not 4-byte aligned for 32-bit operand size. 


A memory address exceeded a data segment limit . 


In long mode, the address of the memory operand was non- 
canonical. 


A null data segment was used to reference memory. 


A non-writeable data segment was used. 


«x | KL XK] XK | KY] XK) XK] XK) KY OX 


An execute-only code segment was used to reference 
memory. 


Stack, #SS 


x< 


A memory address exceeded the stack segment limit or 
was non-canonical. 


Page fault, #PF 


The linear address is not a user shadow stack page. 


A page fault resulted from the execution of the instruction. 


502 


[ AM D Py bli C U S e] System Instruction Reference 


AMDi@1 


24594—Rev. 3.33—November 2021 AMD64 Technology 


Appendix A Opcode and Operand Encodings 


This appendix specifies the opcode and operand encodings for each instruction in the AMD64 
instruction set. As discussed in Chapter 1, “Instruction Encoding,” the basic operation and implied 
operand type(s) of an instruction are encoded by the binary value of the opcode byte. The 
correspondence between an opcode binary value and its meaning is provided by the opcode map. 


Each opcode map has 256 entries and can encode up to 256 different operations. Since the AMD64 
instruction set comprises more than 256 instructions, multiple opcode maps are utilized to encode the 
instruction set. A particular opcode map is selected using the instruction encoding syntax diagrammed 
in Figure 1-1 on page 2. For each opcode map, values may be reserved or utilized for purposes other 
than encoding an instruction operation. 


To preserve compatibility with future instruction architectural extensions, reserved opcodes should not 
be used. If a means to reliably cause an invalid-opcode exception (#UD) is required, software should 
use one of the UDx opcodes. These opcodes are set aside for this purpose and will not be used for 
future instructions. The UD opcodes are located on the secondary opcode map at code points B9h, 
OBh, and FFh. 


The following section provides a key to the notation used in the opcode maps to specify the implied 
operand types. 


Opcode-Syntax Notation 


In the opcode maps which follow, each table entry represents a specific form of an instruction, 
identifying the instruction by its mnemonic and listing the operand or operands peculiar to that 
opcode. If a register-based operand is specified by the opcode itself, the operand is represented directly 
using the register mnemonic as defined in “Summary of Registers and Data Types” on page 38. If the 
operand is encoded in one or more bytes following the opcode byte, the following special notation is 
used to represent the operand and its encoding in more generic terms. 


This special notation, used exclusively in the opcode maps, is composed of three parts: 


* an initial capital letter that represents the operand source / destination (register-based, memory- 
based, or immediate) and how it is encoded in the instruction (either as an immediate, or via the 
ModRM.reg, ModRM. {mod,r/m}, or VEX/XOP.vvwv fields). For register-based operands, the 
inital letter also specifies the register type (General-purpose, MMX, YMM/XMM, debug, or 
control register). 


* one, two, or three letter modifier (in lowercase) that represents the data type (for example, byte, 
word, quadword, packed single-precision floating-point vector). 


* x, which indicates for an SSE instruction that the instruction supports both vector sizes (128 bits 
and 256 bits). The specific vector size is encoded in the VEX/XOP.L field. L=0 indicates 128 bits 
and L=1 indicates 256 bits. 
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The following list describes the meaning of each letter that is used in the first position of the operand 
notation: 


A far pointer encoded in the instruction. No ModRM byte in the instruction encoding. 


B__ General-purpose register specified by the VEX or XOP vvwv field. 

C Control register specified by the ModRM.reg field. 

D__ Debug register specified by the ModRM.reg field. 

E General purpose register or memory operand specified by the r/m field of the ModRM byte. For 
memory operands, the ModRM byte may be followed by a SIB byte to specify one of the indexed 
register-indirect addressing forms. 

F  rFLAGS register. 

G General purpose register specified by the ModRM. reg field. 

H  YMM or XMM register specified by the VEX/XOP.vvvv field. 

I Immediate value encoded in the instruction immediate field. 

J The instruction encoding includes a relative offset that is added to the rIP. 

LT =YMM or XMM register specified using the most-significant 4 bits of an 8-bit immediate value. 
In legacy or compatibility mode the most significant bit is ignored. 

M  Amemory operand specified by the {mod, r/m} field of the ModRM byte. ModRM.mod # 11b. 

M* A sSparse array of memory operands addressed using the VSIB addressing mode. See “VSIB 
Addressing” in Volume 4. 

N 64-bit MMX register specified by the ModRM.r/m field. The ModRM.mod field must be 11b. 

O The offset of an operand is encoded in the instruction. There is no ModRM byte in the instruction 
encoding. Indexed register-indirect addressing using the SIB byte is not supported. 

P 64-bit MMX register specified by the ModRM. reg field. 

Q 64-bit MMX-register or memory operand specified by the {mod, r/m} field of the ModRM byte. 
For memory operands, the ModRM byte may be followed by a SIB byte to specify one of the 
indexed register-indirect addressing forms. 

R___ General purpose register specified by the ModRM.r/m field. The ModRM.mod field must be 
11b. 

S Segment register specified by the ModRM. reg field. 

U YMM/XMM register specified by the ModRM.1r/m field. The ModRM.mod field must be 11b. 

V = YMM/XMM register specified by the ModRM. reg field. 

Ww YMM/XMM register or memory operand specified by the {mod, r/m} field of the ModRM byte. 
For memory operands, the ModRM byte may be followed by a SIB byte to specify one of the 
indexed register-indirect addressing forms. 
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Xx 
bg 


A memory operand addressed by the DS.rSI registers. Used in string instructions. 


A memory operand addressed by the ES.rDI registers. Used in string instructions. 


The following list provides the key for the second part of the operand notation: 


a 


Two 16-bit or 32-bit memory operands, depending on the effective operand size. Used in the 
BOUND instruction. 


A byte, irrespective of the effective operand size. 

A byte or a word, depending on the effective operand size. 

A doubleword (32 bits), irrespective of the effective operand size. 

A double octword (256 bits), irrespective of the effective operand size. 

A 16-bit integer. 

A 32-bit integer. 

A bit mask of size equal to the source operand. 

Where n = 2,4,8, or 16. A bit mask of size n. 

An octword (128 bits), irrespective of the effective operand size. 

Operand is either the upper or lower half of a 128-bit value. 

A 32- or 48-bit far pointer, depending on 16- or 32-bit effective operand size. 
Vector with byte-wide (8-bit) elements (packed byte). 

A double-precision (64-bit) floating-point vector operand (packed double-precision). 
Vector composed of 32-bit doublewords. 

A half-precision (16-bit) floating-point vector operand (packed half-precision) 
Vector composed of 16-bit integers (packed integer). 

Vector composed of 32-bit integers (packed double integer). 

Vector composed of 8-bit integers (packed half-word integer). 

Vector composed of 64-bit integers (packed quadword integer). 

Vector composed of 64-bit quadwords (packed quadword). 

A single-precision floating-point vector operand (packed single-precision). 
Vector composed of 16-bit words (packed word). 

A quadword (64 bits), irrespective of the effective operand size. 

A 6-byte or 10-byte pseudo-descriptor. 

A scalar double-precision floating-point operand (scalar double). 


A scalar doubleword (32-bit) integer operand (scalar double integer). 
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ss Ascalar single-precision floating-point operand (scalar single). 
v Aword, doubleword, or quadword (in 64-bit mode), depending on the effective operand size. 
w  Aword, irrespective of the effective operand size. 


x Instruction supports both vector sizes (128 bits or 256 bits). Size is encoded using the 
VEX/XOP.L field. (L=0: 128 bits; L=1: 256 bits). This symbol may be appended to ps or pd to 
represent a packed single- or double-precision floating-point vector of either size; or to pk, pi, pj, 
or pq, to represent a packed 8-bit, 16-bit, 32-bit, or 64-bit packed integer vector of either size. 


y  Adoubleword or quadword depending on effective operand size. 


Zz A word if the effective operand size is 16 bits, or a doubleword if the effective operand size is 32 
or 64 bits. 


For some instructions, fields in the ModRM or SIB byte are used as encoding extensions. This is 

indicated using the following notation: 

/n AModRM-byte reg field or SIB-byte base field, where n is a value between zero (000b) and 7 
(111b). 

For SSE instructions that take scalar operands, VEX/XOP.L field is ignored. 


For immediates and memory-based operands, only the size and not the datatype is indicated. Operand 
widths and datatypes are specified based on the source operands. For instructions where the result 
overwrites one of the source registers, the data width and datatype of the result may not match that of 
the source register. See individual instruction descriptions for more details. 


A.1 Opcode Maps 
In all of the following opcode maps, cells shaded grey represent reserved opcodes. 
A.1.1 Legacy Opcode Maps 


Primary Opcode Map. Tables A-1 and A-2 below show the primary opcode map (known in legacy 
terminology as one-byte opcodes). 


Table A-1 below shows those instructions for which the low nibble is in the range 0—7h. Table A-2 on 
page 508 shows those instructions for which the low nibble is in the range 8—Fh. In both tables, the 
rows show the full range (O—Fh) of the high nibble, and the columns show the specified range of the 
low nibble. 
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Table A-1. Primary Opcode Map (One-byte Opcodes), Low Nibble 0-7h 
Nibble 0 1 2 3 4 5 6 7 
ADD 5 A 
0 PUSH ES POP ES 
Eb, Gb Ev, Gv Gb, Eb Gv, Ev AL, Ib rAX, Iz 
ADC 3 e 
1 PUSH SS POP SS 
Eb, Gb Ev, Gv Gb, Eb Gv, Ev AL, Ib rAX, Iz 
AND 
2 seg ES® DAA? 
Eb, Gb Ev, Gv Gb, Eb Gv, Ev AL, Ib rAX, Iz 
XOR 
3 seg SS® AAA? 
Eb, Gb Ev, Gv Gb, Eb Gv, Ev AL, Ib rAX, Iz 
P INC | REX prefix® 
eAX eCX eDX eBX eSP eBP eSl eDI 
5 PUSH 
rAX/r8 rcx/r9 rDX/r10 rBX/r11 rSP/r12 rBP/r13 rSi/r14 rDWr15 
ARPL? ae ia 
PUSHA®? POPA? BOUND 2 Ew, Gw seg FS seqiGs | SPS S|. eeeleee 
6 3 3 Gv, Ma NEuSKDt refix refix override | size override 
PUSHD POPD : p p prefix prefix 
Gv, Ez 
7 JO Jb JNO Jb JB Jb JNB Jb JZ Jb JNZ Jb JBE Jb JNBE Jb 
e Group 12 TEST XCHG 
Eb, Ib Ev, Iz Eb, Ib? Ev, Ib Eb, Gb Ev, Gv Eb, Gb Ev, Gv 
XCHG 
° Be rCX/r9, rAX | rDX/r10, rAX | rBX/r11, rAX | rSP/r12, rAX | rBP/r13, rAX | rSi/r14, rAX | rDl/r15, rAX 
NOP,PAUSE , , , : t , ; 
A MOV MOVSB MOVSWID/Q CMPSB CMPSWID/Q 
AL, Ob rAX, Ov Ob, AL Ov, rAX Yb, Xb Yv, Xv Xb, Yb Xv, Yv 
MOV 
B AL, Ib CL, Ib DL, Ib BL, Ib AH, Ib CH, Ib DH, Ib BH, Ib 
r8b, Ib r9b, Ib r10b, Ib r11b, Ib r12b, Ib r13b, Ib 114b, Ib r15b, Ib 
Group 22 RET near LES? Gz, Mp | LDS* Gz, Mp Group 112 
Cc 
Eb, Ib Ev, Ib lw eo |! ebb Ev, Iz 
prefix prefix 
Group 22 aaa XLAT 
D AAM Ib? AAD Ib? invalid 
Eb, 1 Ev, 1 Eb, CL Ev, CL XLATB 
LOO- LOOPE/Z IN OUT 
E LOOP Jb JrCXZ Jb 
PNE/NZJb Jb AL, Ib eAX, Ib Ib, AL Ib, eAX 
Group 32 
F LOCK Prefix INT1 REG REF! REE HLT CMC P 
Prefix Prefix Eb Ev 
Notes: 
1. Rows in this table show the high opcode nibble, columns show the low opcode nibble (both in hexadecimal). 
2. Anopcode extension is specified using the reg field of the ModRM byte (ModRM bits [5:3]) which follows the opcode. 
See Table A-6 on page 515 for details. 
3. Invalid in 64-bit mode. 
4. Valid only in 64-bit mode. 
5. Used as REX prefixes in 64-bit mode. 
6. This is a null prefix in 64-bit mode. 
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Table A-2. Primary Opcode Map (One-byte Opcodes), Low Nibble 8-Fh 


Nibble+ 8 9 A B c D E F 
escape to 
0 OR pect secondary 
Eb, Gb Ev, Gv Gb, Eb Gv, Ev AL, Ib rAX, Iz opcode map 
1 SBB PUSH POP 
Eb, Gb Ev, Gv Gb, Eb Gv, Ev AL, Ib rAX, Iz Ds? Ds? 
SUB 
2 seg cs® DAS? 
Eb, Gb Ev, Gv Gb, Eb Gv, Ev AL, Ib rAX, Iz 
CMP 
3 seg DS® AAS? 
Eb, Gb Ev, Gv Gb, Eb Gv, Ev AL, Ib rAX, Iz 
i DEC? / REX prefix® 
eAX eCX eDX eBX eSP eBP eSl eDI 
. POP 
rAX!/r8 rexir9 rDX/r10 rBX/r11 rSP/r12 rBP/r13 rS/r14 rDUr15 
PUSH IMUL PUSH IMUL INSB INSW/D oe ous 
6 Iz Gv, Ev, Iz Ib Gv, Ev, Ib Yb, DX Yz, DX ures OU TSwip 
ne a ‘ ; DX, Xb DX, Xz 
7 JS Jb JNS Jb JP Jb JNP Jb JL Jb JNL Jb JLE Jb JNLE Jb 
MOV Group 1a 
8 LEA MOV XOP 
Eb, Gb Ev, Gv Gb, Eb Gv,Ev | MwiRv, Sw Gv, M Sw, Ew ie 
CBW, CWDE | CWD, CDQ, CALL? WAIT PUSHFID/Q POPFID/Q 
? CDQE CQO Ap FWAIT Fv Fv Sane eer 
A TEST STOSB STOSWID/Q LODSB LODSWID/Q SCASB SCASWIDIQ 
AL, Ib rAX, Iz Yb, AL Yv, rAX AL, Xb rAX, Xv AL, Yb rAX, Yv 
MOV 
B rAX, lv rCx, lv rDX, Iv rBX, lv rSP, lv rBP, Iv rSI, lv rDI, Iv 
r8, Iv r9, Iv r10, Iv r11, Iv 112, Iv r13, Iv 114, Iv r15, Iv 
RET far IRET, IRETD, 
Cc ENTER LEAVE INT3 INT Ib INTO? 
Iw, Ib lw IRETQ 
D x87 instructions 
see Table A-15 on page 526 
JMP IN OUT 
E CALL Jz A 
Jz Ap Jb AL, DX eAX, DX DX, AL DX, eAX 
Group 42 Group 5¢ 
F CLC STC CLI STI CLD STD Eb 
Notes: 
1. Rows in this table show the high opcode nibble, columns show the low opcode nibble (both in hexadecimal). 
2. Anopcode extension is specified using the reg field of the ModRM byte (ModRM bits [5:3]) which follows the opcode. 
See Table A-6 on page 515 for details. 
3. Invalid in 64-bit mode. 
4. Valid only in 64-bit mode. 
5. Used as REX prefixes in 64-bit mode. 
6. This is a null prefix in 64-bit mode. 


Secondary Opcode Map. As described in “Encoding Syntax” on page 1, the escape code OFh 
indicates the switch from the primary to the secondary opcode map. In legacy terminology, the 
secondary opcode map is presented as a listing of “two-byte” opcodes where the first byte is OFh. 
Tables A-3 and A-4 show the secondary opcode map. 
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Table A-3 below shows those instructions for which the low nibble is in the range 0—7h. Table A-4 on 
page 512 shows those instructions for which the low nibble is in the range 8—Fh. In both tables, the 
rows show the full range (O—Fh) of the high nibble, and the columns show the specified range of the 
low nibble. Note the added column labeled “prefix.” 


For the secondary opcode map shown below, the legacy prefixes 66h, F2h, and F3 are repurposed to 
provide additional opcode encoding space. For those rows that utilize them, the presence of a 66h, 
F2h, or F3h prefix changes the operation or the operand types specified by the corresponding opcode 
value. 


As discussed in “Encoding Extensions Using the ModRM Byte” on page 515, some opcode values 
represent a group of instructions. This is denoted in the map entry by “Group n”, where n = [1:17,P]. 
Instructions within a group are encoded by the reg field of the ModRM byte. These encodings are 
specified in Table A-7 on page 517. For some opcodes, both the reg and the r/m field of the ModRM 
byte are used to extend the encoding. See Table A-8 on page 519. 
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Table A-3. Secondary Opcode Map (Two-byte Opcodes), Low Nibble 0-7h 


Prefix |Nibble+ 0 1 2 3 4 5 6 7 
2 2 LAR LSL 
nia 0 Group 6 Group 7 Gv, Ew Gv, Ew SYSCALL CLTS SYSRET 
MOVLPS MOVHPS 
MOVUPS Vq, Mq MOVLPS | UNPCKLPS | UNPCKHPS | Vo.q, Mq MOVHPS 
pee MovHLPs | Mq,Vq | Vps,Wps | Vps,Wps | MOVLHPS |  Mq, Vo.q 
Vpe; Wee: | Be) NBs Vo.q, Uo.q Vo.q, Uo.q 
F3 MOVSS MOVSLDUP MOVSHDUP 
1 Vss, Wss Wss, Vss Vps, Wps Vps, Wps 
66 MOVUPD MOVLPD UNPCKLPD | UNPCKHPD MOVHPD 
Vpd, Wed | Wpd, Vpd Vo.q, Mq Mq, Vo.q_ | Vo.q, Wo.q | Vo.q, Wo.q | Vo.q, Mq Mq, Vo.q 
F2 MOVSD MOVDDUP 
Vsd,Wsd | Wsd, Vsd Vo, Wsd 
mov* 
nla 2 
Rdiq, Cd/q | Rd/q, Dd/q | Cd/q, Rd/q | Ddlq, Rd/q 
nia 3 WRMSR RDTSC RDMSR RDPMC SYSENTER?| SYSEXIT? 
Ala 4 CMOVO CMOVNO CMOVB CMOVNB CMOVZ CMOVNZ CMOVBE CMOVNBE 
Gv, Ev Gv, Ev Gv, Ev Gv, Ev Gv, Ev Gv, Ev Gv, Ev Gv, Ev 
wone MOVMSKPS SQRTPS RSQRTPS RCPPS ANDPS ANDNPS ORPS XORPS 
Gd, Ups Vps, Wps Vps, Wps Vps, Wps Vps, Wps Vps, Wps Vps, Wps Vps, Wps 
F3 SQRTSS RSQRTSS RCPSS 
5 Vss, Wss Vss, Wss Vss, Wss 
66 MOVMSKPD| SQRTPD ANDPD ANDNPD ORPD XORPD 
Gd, Upd Vpd, Wpd Vpd, Wpd Vpd, Wpd Vpd, Wpd Vpd, Wpd 
SQRTSD 
Fe Vsd, Wsd 
PUNPCK- | PUNPCK- | PUNPCK- |5acksswB| PCMPGTB | PcmPGTw | PcMPGTD | PACKUSWB 
nen LBW Ewe ged Ppi,Qpi_ | Ppk,Qpk | Ppi,Qpi | Ppij, Opi Ppi, Qpi 
Pq, Od Pq, Qd Pq, Od pi, Qp pk, Qp pi, Qp Pj, Qpj pi, Qp 
F3 
6 PUNPCK: PUNPCK PUNPCK: 
: : ~ |PACKSSWB| PCMPGTB | PCMPGTW | PCMPGTD | PACKUSWB 
66 EBW ENP pats Vpi, Wpi Vpk, Wpk Vpi, Wpi Vpj, Wpj Vpi, Wpi 
Vo.q, Wo.q | Vo.q, Wo.q | Vo.q, Wo.q pee Pes mP Bee PJ, NPI pee 
F2 
PSHUFW PCMPEQB | PCMPEQW | PCMPEQD 
none é ‘ ‘ EMMS 
Pq, Qq, Ib Ppk, Qpk Ppi, Qpi Ppj, Qpj 
PSHUFHW 
i 7 Wa aP | croup as | eroupas® |'6 14? 
66 PSHUFD P oe sy PCMPEQB | PCMPEQW | PCMPEQD 
Vo, Wo, Ib Vpk, Wpk Vpi, Wpi Vpj, Wpj 
PSHUFLW 
F2 
Vq, Wg, Ib 
Notes: 
1. Rows show the high opcode nibble, columns show the low opcode nibble (both in hexadecimal). All opcodes in this 
map are immediately preceeded in the instruction encoding by the escape byte OFh. 
2. An opcode extension is specified using the reg field of the ModRM byte (ModRM bits [5:3]) which follows the opcode. 
See Table A-7 on page 517 for details. 
3. Invalid in long mode. 
4. Operand size is based on processor mode. 
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Table A-3. Secondary Opcode Map (Two-byte Opcodes), Low Nibble 0-7h (continued) 


map are immediately preceeded in the instruction encoding by the escape byte OFh. 


An opcode extension is specified using the reg field of the ModRM byte (ModRM bits [5:3]) which follows the opcode. 
See Table A-7 on page 517 for details. 


Invalid in long mode. 
Operand size is based on processor mode. 


Prefix |Nibble2 0 1 2 3 4 5 6 7 
nla 8 JO Jz JNO Jz JB Jz JNB Jz JZ Iz JINZ Jz JBE Jz JNBE Jz 
nia 9 SETO Eb SETNO Eb SETB Eb SETNB Eb SETZ Eb SETNZ Eb | SETBE Eb SETNBE Eb 
SHLD 
nia A PUSH FS POP FS CPUID BT Ev, Gv 
Ev, Gv, Ib Ev, Gv, CL 
CMPXCHG MOVZX 
nla B LSS Gz, Mp | BTR Ev, Gv | LFS Gz, Mp | LGS Gz, Mp 
Eb, Gb Ev, Gv Gv, Eb Gv, Ew 
none XADD CMPPS MOVNTI A PEXTRW SHUFPS 
Vps, Wps, Ib My, Gy 4: is ’ | Gd, Nq, lb |Vps, Wps, Ib 
CMPSS 
F3 
Cc Vss, Wss, Ib Group 92 
66 — CMPPD PINS! PEXTRW SHUFPD Mq 
ESS Ev, Gv | Vpd, Wpd, Ib ve; Aes: Gd, Uo, Ib |Vpd, Wpd, Ib 
CMPSD 
F2 
Vsd, Wsd, Ib 
none PSRLW PSRLD PSRLQ PADDQ PMULLW PMOVMSKB 
Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Qq Gd, Nq 
MOVQ2D' 
F3 res 
D Vo, Nq 
66 ADDSUBPD PSRLW PSRLD PSRLQ PADDQ PMULLW MOVQ PMOVMSKB 
Vpd, Wpd Vo, Wo Vo, Wo Vo, Wo Vo, Wo Vo, Wo Wa, Vq Gd, Uo 
F2 ADDSUBPS MOVDQ2Q 
Vps, Wps Pq, Uq 
none PAVGB PSRAW PSRAD PAVGW PMULHUW | PMULHW MOVNTQ 
Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Qq Mq, Pq 
CVTDQ2PD 
de Vpd, Wj 
66 PAVGB PSRAW PSRAD PAVGW PMULHUW | PMULHW /|CVTTPD2DQ;} MOVNTDQ 
Vo, Wo Vo, Wo Vo, Wo Vo, Wo Vo, Wo Vo, Wo Vpj, Wpd Mo, Vo 
CVTPD2DQ 
- Vpi, Wpd 
none PSLLW PSLLD PSLLQ PMULUDQ | PMADDWD | PSADBW MASKMOVQ 
Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Nq 
F3 
FE 
66 PSLLW PSLLD PSLLQ PMULUDQ | PMADDWD | PSADBW_ |MASKMOVDQU 
Vpw, Wo.q | Vpwd, Wo.q | Vpqw, Wo.q| Vpj, Wpj Vpi, Wpi Vpk, Wpk Vpb, Upb 
LDDQU 
F2 
Vo, Mo 
Notes: 
1. Rows show the high opcode nibble, columns show the low opcode nibble (both in hexadecimal). All opcodes in this 
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Table A-4. Secondary Opcode Map (Two-byte Opcodes), Low Nibble 8-Fh 


Prefix | Nibble* 8 9 A B Cc D E F 
Group P2 3DNow! 
WBINVD See 
nla INVD F3 UD2 FEMMS “2DNow!™ 
WENEnG PeerereH Opcodes” 
on page 522 
NOP? 
Group 162 NOP? NOP? NOP? NOP? (Fs) RDSSP NOP? NOP? 
nla reg=1, 
mod=11 
Sank MOVAPS CVTPI2PS | MOVNTPS | CVTTPS2PI | CVTPS2PI UCOMISS COMISS 
Vps, Wps Wps, Vps Vps, Qpj Mo, Vps Ppj, Wps Ppj, Wps Vss, Wss Vss, Wss 
F3 CVTSI2SS | MOVNTSS | CVTTSS2SI| CVTSS2SI 
Vss, Ey Md, Vss Gy, Wss Gy, Wss 
66 MOVAPD CVTPI2PD | MOVNTPD | CVTTPD2PI| CVTPD2PI | UCOMISD COMISD 
Vpd, Wpd Wpd, Vpd Vpd, Qpj Mo, Vpd Ppj, Wpd Ppj, Wpd Vsd, Wsd Vsd, Wsd 
F2 CVTSI2SD | MOVNTSD | CVTTSD2SI| CVTSD2SI 
Vsd, Ey Mq, Vsd Gy, Wsd Gy, Wsd 
Escape to Escape to 
nla OF_38h OF_3Ah 
opcode map opcode map 
sia CMOVS CMOVNS CMOVP CMOVNP CMOVL CMOVNL CMOVLE | CMOVNLE 
Gv, Ev Gv, Ev Gv, Ev Gv, Ev Gv, Ev Gv, Ev Gv, Ev Gv, Ev 
ADDPS MULPS CVTPS2PD | CVTDQ2PS SUBPS MINPS DIVPS MAXPS 
nen Vps, Wps Vps, Wps Vpd, Wps Vps, Wo Vps, Wps Vps, Wps Vps, Wps Vps, Wps 
F3 ADDSS MULSS CVTSS2SD |CVTTPS2DQ| SUBSS MINSS DIVSS MAXSS 
Vss, Wss Vss, Wss Vsd, Wss Vo, Wps Vss, Wss Vss, Wss Vss, Wss Vss, Wss 
66 ADDPD MULPD CVTPD2PS | CVTPS2DQ SUBPD MINPD DIVPD MAXPD 
Vpd, Wpd Vpd, Wpd Vps, Wpd Vo, Wps Vpd, Wpd Vpd, Wpd Vpd, Wpd | Vpd, Wpd 
F2 ADDSD MULSD CVTSD2SS SUBSD MINSD DIVSD MAXSD 
Vsd, Wsd Vsd, Wsd Vss, Wsd Vsd, Wsd Vsd, Wsd Vsd, Wsd Vsd, Wsd 
PUNPCK- PUNPCK- PUNPCK- 
wane HBW HWD HDQ PACKSSDW MOVD MOVQ 
Pq, Qd Pq, Qd Pq, Qd Pq, Qq Py, Ey Pq, Qq 
F3 MOVDQU 
Vo, Wo 
PUNPCK- PUNPCK- PUNPCK- PUNPCK- PUNPCKH- 
66 HBW HWD HDQ PACKSSDW LODO QDo MOVD MOVDQA 
Vo, Wq Vo, Wq Vo, Wq Vo, Wo Vo, Wq Vo, Wq vy, Ey Vo, Wo 
F2 
Notes: 


1. Rows show the high opcode nibble, columns show the low opcode nibble (both in hexadecimal). All opcodes in this 
map are immediately preceeded in the instruction encoding by the escape byte OFh. 

2. Anopcode extension is specified using the reg field of the ModRM byte (ModRM bits [5:3]) which follows the opcode. 
See Table A-7 on page 517 for details. 

3. This instruction takes a ModRM byte. 
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Table A-4. Secondary Opcode Map (Two-byte Opcodes), Low Nibble 8-Fh 


AMD64 Technology 


map are immediately preceeded in the instruction encoding by the escape byte OFh. 
An opcode extension is specified using the reg field of the ModRM byte (ModRM bits [5:3]) which follows the opcode. 


See Table A-7 on page 517 for details. 
This instruction takes a ModRM byte. 


Prefix | Nibble? 8 9 A B Cc D E F 
MOVD MOVQ 
none 
Ey, Py Qq; Pq 
F3 MOVQ MOVDQU 
Vq, Wq Wo, Vo 
66 7 Group 172 EXTRQ HADDPD HSUBPD MOVD MOVDQA 
Vo.q, Uo Vpd, Wpd Vpd, Wpd Ey, Vy Wo, Vo 
INSERTQ INSERTQ HADDPS HSUBPS 
F2 : : 
ae fe 4: Vo.q, Uo Vps, Wps Vps, Wps 
nla 8 JS JNS JP JNP JL JNL JLE JNLE 
Jz Jz Jz Jz Jz Jz Jz Jz 
nia 9 SETS SETNS SETP SETNP SETL SETNL SETLE SETNLE 
Eb Eb Eb Eb Eb Eb Eb Eb 
nia A PUSH POP RSM BTS SHRD Group 152 IMUL 
GS GS Ev, Gv Ev, Gv, Ib Ev, Gv, CL Gv, Ev 
ate Group 10% | Group 82 BTC BSF BSR MOVSX 
Ev, Ib Ev, Gv Gv, Ev Gv, Ev Gv, Eb Gv, Ew 
POPCNT TZCNT LZCNT 
F3 B 
Gv, Ev Gv, Ev Gv, Ev 
F2 
nia Cc BSWAP 
rAXir8& rcxir9 rDX/r10 rBX/r11 rSP/r12 rBP/r13 rSi/r14 rDU/ri5 
Hane PSUBUSB | PSUBUSW PMINUB PAND PADDUSB PADDUSW PMAXUB PANDN 
Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Qq 
F3 
D 
66 PSUBUSB | PSUBUSW PMINUB PAND PADDUSB PADDUSW PMAXUB PANDN 
Vo, Wo Vo, Wo Vo, Wo Vo, Wo Vo, Wo Vo, Wo Vo, Wo Vo, Wo 
F2 
Hane PSUBSB PSUBSW PMINSW POR PADDSB PADDSW PMAXSW PXOR 
Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Qq 
F3 
E 
66 PSUBSB PSUBSW PMINSW POR PADDSB PADDSW PMAXSW PXOR 
Vo, Wo Vo, Wo Vo, Wo Vo, Wo Vo, Wo Vo, Wo Vo, Wo Vo, Wo 
F2 
Notes: 
1. Rows show the high opcode nibble, columns show the low opcode nibble (both in hexadecimal). All opcodes in this 
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Table A-4. Secondary Opcode Map (Two-byte Opcodes), Low Nibble 8-Fh 


Prefix | Nibble 8 9 A B Cc D E F 
Hale PSUBB PSUBW PSUBD PSUBQ PADDB PADDW PADDD 
Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Qq Pq, Qq 
F3 
F UDO 
66 PSUBB PSUBW PSUBD PSUBQ PADDB PADDW PADDD 
Vo, Wo Vo, Wo Vo, Wo Vo, Wo Vo, Wo Vo, Wo Vo, Wo 
F2 
Notes: 


1. Rows show the high opcode nibble, columns show the low opcode nibble (both in hexadecimal). All opcodes in this 
map are immediately preceeded in the instruction encoding by the escape byte OFh. 


2. Anopcode extension is specified using the reg field of the ModRM byte (ModRM bits [5:3]) which follows the opcode. 
See Table A-7 on page 517 for details. 


3. This instruction takes a ModRM byte. 


rFLAGS Condition Codes for CMOVcc, Jcc, and SETcc Instructions. Table A-5 shows 
the rFLAGS condition codes specified by the low nibble in the opcode of the CMOVcc, Jcc, and 
SETcc instructions. 


Table A-5. rFLAGS Condition Codes for CMOVcc, Jcc, and SETcc 
pow Nibble of rFLAGS Value | cc Mnemonic AMLnMeHe Condition(s) 
Opcode (hex) Type 
0 OF=1 oO . Overflow 
Signed 
1 OF =0 NO No Overflow 
2 CF=1 B, C, NAE Below, Carry, Not Above or Equal 
3 CF=0 NB, NC, AE Not Below, No Carry, Above or Equal 
4 ZF=1 Z,E : Zero, Equal 
Unsigned 
5 ZF=0 NZ, NE Not Zero, Not Equal 
6 CF=10rZF=1 BE, NA Below or Equal, Not Above 
7 CF =O and ZF=0 NBE, A Not Below or Equal, Above 
8 SF=1 S ; Sign 
Signed - 
9 SF=0 NS Not Sign 
A PF=1 P, PE , Parity, Parity Even 
nla 
B PF=0 NP, PO Not Parity, Parity Odd 
Cc (SF xor OF) = 1 L, NGE Less than, Not Greater than or Equal to 
D (SF xor OF) = 0 NL, GE Not Less than, Greater than or Equal to 
E eae Ba A LE, NG Signed Less than or Equal to, Not Greater than 
F (oF xal CF) Sts NLE, G Not Less than or Equal to, Greater than 
and ZF =0 
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Encoding Extensions Using the ModRM Byte. The ModRM byte, which immediately 
follows the opcode byte, is used in certain instruction encodings to provide additional opcode bits with 
which to define the function of the instruction. ModRM bytes have three fields—mod, reg, and r/m, as 
shown in Figure A-1. 


Figure A-1. ModRM-Byte Fields 


In most cases, the reg field (bits [5:3]), and in some cases, the r/m field (bits [2:0]) provide the 
additional bits used to extend the encodings of the opcode byte. In the case of the x87 floating-point 
instructions, the entire ModRM byte is used to extend the opcode encodings. 


Table A-6 shows how the ModRM. reg field is used to extend the range of opcodes in the primary 
opcode map. The opcode ranges are organized into groups of opcode extensions. The group number is 
shown in the left-most column. These groups are referenced in the primary opcode map shown in 
Table A-1 on page 507 and Table A-2 on page 508. An entry of “n.a.” in the Prefix column means that 
prefixes are not applicable to the opcodes in that row. Prefixes only apply to certain 64-bit media and 
SSE instructions. 


Table A-7 on page 517 shows how the ModRM. reg field is used to extend the range of the opcodes in 
the secondary opcode map. 


The /0 through /7 notation for the ModRM reg field (bits [5:3]) in the tables below means that the 
three-bit field contains a value from zero (000b) to 7 (111b). 


Table A-6. ModRM.reg Extensions for the Primary Opcode Map?* 


Group ; ModRM reg Field 
Number | Pref | Opcode > 1 [2 13 14 15 16 17 
80 ADD OR ADC SBB AND SUB XOR CMP 
Eb, Ib Eb, Ib Eb, Ib Eb, Ib Eb, Ib Eb, Ib Eb, Ib Eb, Ib 
81 ADD OR ADC SBB AND SUB XOR CMP 
Ev, Iz Ev, Iz Ev, Iz Ev, Iz Ev, Iz Ev, Iz Ev, Iz Ev, Iz 
Group 1 nla 
ADD OR ADC SBB AND SUB XOR CMP 
82 2 2 2 2 2 2 2 2 
Eb, Ib Eb, Ib Eb, Ib Eb, Ib Eb, Ib Eb, Ib Eb, Ib Eb, Ib 
83 ADD OR ADC SBB AND SUB XOR CMP 
Ev, Ib Ev, Ib Ev, Ib Ev, Ib Ev, Ib Ev, Ib Ev, Ib Ev, Ib 
Notes: 
1. See Table A-7 on page 517 for ModRM extensions for the secondary (two-byte) ocode map. 
2. Invalid in 64-bit mode. 
3. This instruction takes a ModRM byte. 
4. Reserved prefetch encodings are aliased to the /O encoding (PREFETCH Exclusive) for future compatibility. 
5. Redundant encoding generally unsupported by tools.. 


Opcode and Operand Encodings AM D Publi c Us e] 515 


AMDd¢1 


AMD64 Technology 


24594—Rev. 3.33—November 2021 


Table A-6. ModRM.reg Extensions for the Primary Opcode Map? (continued) 


Group 
Number 


Group 1a 


Group 2 


Group 3 


Group 4 


Group 5 


Group 11 


Notes: 


aAKRWNE 


Prefix 


Opcode 


ModRM reg Field 


10 I1 12 I3 14 15 16 17 
POP 
nla 8F XOP 
Ev 
co ROL ROR RCL RCR SHL/ISAL SHR SHL/SAL® SAR 
Eb, Ib Eb, Ib Eb, Ib Eb, Ib Eb, Ib Eb, Ib Eb, Ib Eb, Ib 
ea ROL ROR RCL RCR SHLISAL SHR SHL/SAL® SAR 
Ev, Ib Ev, Ib Ev, Ib Ev, Ib Ev, Ib Ev, Ib Ev, Ib Ev, Ib 
Do ROL ROR RCL RCR SHLISAL SHR SHL/SAL® SAR 
nie Eb, 1 Eb, 1 Eb, 1 Eb, 1 Eb, 1 Eb,1 Eb, 1 Eb, 1 
a ROL ROR RCL RCR SHL/ISAL SHR SHL/SAL® SAR 
Ev, 1 Ev, 1 Ev, 1 Ev, 1 Ev, 1 Ev, 1 Ev, 1 Ev, 1 
- ROL ROR RCL RCR SHL/ISAL SHR SHL/SAL® SAR 
Eb, CL Eb, CL Eb, CL Eb, CL Eb, CL Eb, CL Eb, CL Eb, CL 
D3 ROL ROR RCL RCR SHLISAL SHR SHL/SAL® SAR 
Ev, CL Ev, CL Ev, CL Ev, CL Ev, CL Ev, CL Ev, CL Ev, CL 
F6 TEST NOT NEG MUL IMUL DIV IDIV 
ie Eb,Ib Eb Eb Eb Eb Eb Eb 
= TEST NOT NEG MUL IMUL DIV IDIV 
Ev,Iz Ev Ev Ev Ev Ev Ev 
INC DEC 
nla FE 
Eb Eb 
ia ee INC DEC CALL CALL JMP JMP PUSH 
Ev Ev Ev Mp Ev Mp Ev 
MOV 
nla C6 
Eb, Ib 
MOV 
nla C7 
Ev, Iz 


See Table A-7 on page 517 for ModRM extensions for the secondary (two-byte) ocode map. 
Invalid in 64-bit mode. 
This instruction takes a ModRM byte. 
Reserved prefetch encodings are aliased to the /O encoding (PREFETCH Exclusive) for future compatibility. 
Redundant encoding generally unsupported by tools.. 
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Table A-7. ModRM.reg Extensions for the Secondary Opcode Map 


ModR\M reg Field 
Croup Prefix | Opcode 9 
Number 10 1 12 13 14 15 16 I7 
SLDT 
Group 6 nla OF 00 MwiRv STR Mwi/Rv | LLDT Ew | LTREw | VERREw | VERW Ew 
slot LGDT Ms | LIDT Ms 1 INVER? 
SGDT Ms SMSW Mw |RSTORSSP Mb 
Group 7| nila OF 01 1 1 = LMSW Ew 1 
Ms MONITOR! | XGETBV ent IRv (mod!=11) SWAPGS 
MWAIT XSETBV RDTSCP 
Group 8 nla OF BA BT Ev, lb | BTS Ev, Ib | BTR Ev, Ib| BTC Ev, Ib 
CMPX- 
none 
CHG8B Mq RDRAND | RDSEED 
66 CMPX- Rv Rv 
Group 9 OF C7 CHG16B Mo 
F2 
RDPID 
F3 
Rdiq 
Group 
10 nla OF BO UD1 
PSRLW PSRAW PSLLW 
none 
Nq, Ib Nq, Ib Nq, Ib 
Group 
12 66 OF 71 PSRLW PSRAW PSLLW 
Uo, Ib Uo, Ib Uo, Ib 
F2, F3 
PSRLD PSRAD PSLLD 
none 
Nq, Ib Nq, Ib Nq, Ib 
Group 
OF 72 PSRLD PSRAD PSLLD 
13 66 
Uo, Ib Uo, Ib Uo, Ib 
F2, F3 
PSRL 
Aine Q PSLLQ 
Nq, Ib Nq, Ib 
Group 66 OF 73 PSRLQ | PSRLDQ PSLLQ PSLLDQ 
14 Uo, Ib Uo, Ib Uo, Ib Uo, Ib 
F2, F3 
Notes: 
1. Opcode is extended further using the r/m field of the ModRM byte in conjunction with the reg field. See Table A-8 
on page 519 for ModRM.r/m extensions of this opcode. 
2. Invalid in 64-bit mode. 
3. This instruction takes a ModRM byte. 
4. Reserved prefetch encodings are aliased to the /O encoding (PREFETCH Exclusive) for future compatibility. 
5. ModRM.mod = 11b. 
6. ModRM.mod # 11b. 
7. ModRM.mod # 11b, ModRM.mod = 11b is an invalid encoding. 


Opcode and Operand Encodings AM D Pu blic Use] 


517 


AMD¢\ 
AMD64 Technology 24594—Rev. 3.33—November 2021 


Table A-7. ModRM.reg Extensions for the Secondary Opcode Map 


Group : ModR\M reg Field 
Prefix | Opcode 
Number i 10 i [2 [3 4 5 16 i7 
MFENCE® | SFENCE® 
FXSAVE FXRSTOR |LDMXCSR STMxes 6 LFENCE® 
none M M Md R XSAVE M XRSTOR M® XSAVE- | CLFLUSH 
Md OPT M® Mb® 
RDFSBASE WRFSBASE | WRGS- 
Te F3 OF AE Rv RDGSBASE Rv Rv BASE Rv INCSSP CLRSSBSY 
F2 
66 CLWB Mb® 
Group PREFETCH | PREFETCH | PREFETCH | PREFETCH NOP NOP NOP NOP 
nla. OF 18 
16 NTA TO T1 T2 
EXTR 
G ' ve ta 
roup OF 78 0.q, Ib, 
17 none, 
F2, F3 
PREFETCH | PREFETCH | PREFETCH 4 4 3 4 
Group P| nia. OF OD 7 2 PREFETCH a PREFETCH™| PREFETCH™ | PREFETCH™ | PREFETCH 
Exclusive Modified Modified 
Notes: 
1. Opcode is extended further using the r/m field of the ModRM byte in conjunction with the reg field. See Table A-8 
on page 519 for ModRM.r/m extensions of this opcode. 
2. Invalid in 64-bit mode. 
3. This instruction takes a ModRM byte. 
4. Reserved prefetch encodings are aliased to the /O encoding (PREFETCH Exclusive) for future compatibility. 
5. ModRM.mod = 11b. 
6. ModRM.mod # 11b. 
7. ModRM.mod # 11b, ModRM.mod = 11b is an invalid encoding. 
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Secondary Opcode Map, ModRM Extensions for Opcode 01h. Table A-8 below shows 
the ModRM byte encodings for the 01h opcode. In the table the full ModRM byte is listed below the 
instruction in hexadecimal, with ellipses representing the [OFh, 01h] opcode bytes. 


Table A-8. Opcode 01h ModRM Extensions 


reg Field | Prefix ModRM.r/m Field 


0 1 2 3 4 5 6 7 
fa none | MONITOR MWAIT CLAC STAC 
(...C8) (..-C9) (...CA) (..-CB) 
2 ne XGETBV XSETBV 
(...D0) (...D1) 
sane VMRUN | VMMCALL | VMLOAD VMSAVE | STGI | CLGI SKINIT INVLPGA 
(...D8) (...D9) (...DA) (..DB) | (...DC)| (...DD) (...DE) (...DF) 
8 F3 VMGEXIT 
F2 (...D9) 
none RDPKRU WRPKRU 
15 SAVE- 
F3 SETSSBSY PREVESe 
pate MON...ITORX | MWAITX RDPRU 
(FA) (...FB) (...FD) 
17 F3 SWAPGS RDTSCP | MCOMMIT RMPADJUST | PSMASH 
(...F8) (...F9) (F3...FA) (F3...FE) (F3...FF) 
is RMPUPDATE | PVALIDATE 
(F2...FE) (F2...FF) 


ModRM.mod = 11b 


OF_38h and OF_3Ah Opcode Maps. The OF_38h and OF_3Ah opcode maps are used primarily 
to encode the legacy SSE instructions. In legacy terminology, these maps are presented as three-byte 
opcodes where the first two bytes are {OFh, 38h} and {OFh, 3Ah} respectively. 


In these maps the legacy prefixes F2h and F3h are repurposed to provide additional opcode encoding 
space. In rows [0:E] the legacy prefix 66h is also used to modify the opcode. However, in row F, 66h is 
used as an operand-size override. See the CRC32 instruction as an example. 


The OF_38h opcode map is presented below in Tables A-9 and A-10. The OF_3Ah opcode map is 
presented in Tables A-11 and A-12. 
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Table A-9. OF_38h Opcode Map, Low Nibble = [0h:7h] 


Prefix | Opcode x0 pal x2 x3 x4 x5 x6 x7. 
Aahie PSHUFB PHADDW PHADDD PHADDSW PMADDUBSW PHSUBW PHSUBD PHSUBSW 
5s Ppb, Qpb Ppi, Qpi Ppj, Qpj Ppi, Qpi Ppk, Qpk Ppi, Qpi Ppj, Qpj Ppi, Qpi 
66 PSHUFB PHADDW PHADDD PHADDSW PMADDUBSW PHSUBW PHSUBD PHSUBSW 
PVb, Wpb Vpi, Wpi Vpj, Vpj Vpi, Wpi Vpk, Wpk Vpi, Wpi Vpj, Wpj Vpi, Wpi 
none 
66 ix PBLENDVB BLENDVPS PBLENDVB PTEST 
Vpb, Wpb Vps, Wps Vpb, Wpb Vo, Wo 
none 
66 2x PMOVSXBW PMOVSXBD PMOVSXBQ_ | PMOVSXWD PMOVSXWQ PMOVSXDQ 
Vpi, Wpk Vpj, Wpk Vpq, Wpk Vpj, Wpi Vpq, Wpi Vpq, Wpj 
none 
66 3x PMOVZXBW PMOVZXBD PMOVZXBQ_ | PMOVZXWD PMOVZXWQ PMOVZXDQ PCMPGTQ 
Vpi, Wpk Vpj, Wpk Vpq, Wpk Vpj, Wpi Vpq, Wpi Vpq, Wpj Vpq, Wpq 
none 
66 4x PMULLD PHMINPOSUW 
Vpj, Wpj Vpi, Wpi 
5x-Ex 
Kane MOVBE MOVBE WRSS 
Gv, Mv Mv, Gv My, Gy 
F2 CRC32 CRC32 
Gy, Eb Gy, Ev 
66 Fx MOVBE MOVBE WRUSS 
Gv, Mv Gv, Mv My, Gy 
66 CRC32 CRC32 
and Gy, Eb Gy, Ev 
F2 
Table A-10. OF_38h Opcode Map, Low Nibble = [8h:Fh] 
Prefix] Opcode x8 x9 xA xB xC xD xE xF 
PSIGNB PSIGNW PSIGND PMULHRSW 
none Ppk, Qpk Ppi, Qpi Ppj, Qpj Ppi, Qpi 
m% PSIGNB PSIGNW PSIGND PMULHRSW 
66 Vpj, Wj 
PABSB PABSW PABSD 
none Ppk, Qpk Ppi, Qpi Ppj, Qpj 
cm—_ ix 
PABSB PABSW PABSD 
66 Vpk, Wpk Vpi, Wpi Vpj, Wpj 
none 
pm—_ 2 
7 PMULDQ PCMPEQQ MOVNTDQA PACKUSDW 
66 Vpq, Wpj Vpq, Wpq Vo, Mo Vpi, Wpj 
none 
3 
* PMINSB PMINSD PMINUW PMINUD PMAXSB PMAXSD PMAXUW PMAXUD 
66 Vpk, pk Vpj, Wpj Vpi, Wpi Vpj, Wpj Vpk, Wek Vpj, Wpj Vpi, Wpi Vpj, Wpj 
AESIMC AESENC AESENCLAST AESDEC AESDECLAST 
66 Dx Vo, Wo Vo, Wo Vo, Wo Vo, Wo Vo, Wo 
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Table A-11. OF_3Ah Opcode Map, Low Nibble = [0h:7h] 
Prefix| Opcode x0 x1 x2 x3 x4 x5 x6 x7 
= ” an 
none 
1x PEXTRB PEXTRW PEXTRD EXTRACTPS 
fe Mb, Vpk, Ib Mw, Vpw, Ib Ed, Vpj, Ib Md, Vps, Ib 
PEXTRB PEXTRW PEXTRQ? EXTRACTPS 
Ry, Vpk, Ib Ry, Vpw, Ib Eq, Vpq, Ib Ry, Vps, Ib 
none 
2x PINSRB INSERTPS PINSRD 
66 Vpk, Mb, Ib Vps, Md, Ib Vpj, Ed, Ib 
PINSRB INSERTPS PINSRQ” 
Vpk, Ry, Ib Vps, Uo, Ib Vpq, Eq, Ib 
3x ae ee |e) ee |] 
none 
ax DPPS DPPD MPSADBW PCLMULQDQ 
66 Vps, Wps, Ib Vpd, Wpd, Ib Vpk, Wpk, Ib Vpq, Wpgq, Ib 
: zee S ey 
ea SS SS SS ESS ea es 
m PCMPESTRM PCMPESTRI PCMPISTRM PCMPISTRI 
66 Vo, Wo, |b Vo, Wo, Ib Vo, Wo, Ib Vo, Wo, Ib 
" ‘ SS qe qj 
| Note 1: When REX prefix is present 


Table A-12. OF_3Ah Opcode Map, Low Nibble = [8h:Fh] 


2x PINSRB INSERTPS PINSRD 
7 Vpk, Mb, Ib Vps, Md, Ib Vpj, Ed, Ib 
PINSRB INSERTPS PINSRQ* 
Vpk, Ry, Ib Vps, Uo, Ib Vpq, Eq, Ib 
: = 
none 
4x 
DPPS DPPD MPSADBW PCLMULQDQ 
66 Vps, Wps, Ib Vpd, Wpd, Ib Vpk, Wpk, Ib Vpq, Wpgq, Ib 
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A.1.2 3DNow!™ Opcodes 


The 64-bit media instructions include the MMX™ instructions and the AMD 3DNow!™ instructions. 
The MMxX instructions are encoded using two opcode bytes, as described in “Secondary Opcode Map” 
on page 508. 


The 3DNow! instructions are encoded using two OFh opcode bytes and an immediate byte that is 
located at the last byte position of the instruction encoding. Thus, the format for 3DNow! instructions 
is: 


OFh OFh [ModRM] [SIB] [displacement] imm8_opcode 


Table A-13 and Table A-14 on page 524 show the immediate byte following the opcode bytes for 
3DNow! instructions. In these tables, rows show the high nibble of the immediate byte, and columns 
show the low nibble of the immediate byte. Table A-13 shows the immediate bytes whose low nibble 
is in the range 0—7h. Table A-14 shows the same for immediate bytes whose low nibble is in the range 
8—Fh. 


Byte values shown as reserved in these tables have implementation-specific functions, which can 
include an invalid-opcode exception. 
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Table A-13. Immediate Byte for 3DNow!™ Opcodes, Low Nibble 0-7h 
Nibble i) 1 2 3 4 5 6 7 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 PFCMPGE PFMIN PFRCP PFRSQRT 
Pq; Qq Pq, Qq Pq, Qq Pq, Qq 

A PFCMPGT PFMAX PFRCPIT1 PFRSQIT1 
Pq, Qq Pq; Qq Pq, Qq Pq, Qq 

B PFCMPEQ PFMUL PFRCPIT2 PMULHRW 
Pq; Qq Pq, Qq Pq, Qq Pq, Qq 

Cc 

D 

E 

F 

Notes: 
1. All 3DNow!™ opcodes consist of two OFh bytes. This table shows the immediate byte for 3DNow! opcodes. Rows 
show the high nibble of the immediate byte. Columns show the low nibble of the immediate byte. 
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Table A-14. Immediate Byte for 3DNow!™ Opcodes, Low Nibble 8-Fh 
Nibble 8 9 A B Cc D E F 
0 PI2ZFW PI2FD 
Pq, Qq Pq; Qq 
1 PF2IW PF2ID 
Pq; Qq Pq; Qq 

2 
3 
4 
5 
6 
7 
8 PFNACC PFPNACC 

Pq, Qq Pq, Qq 
9 PFSUB PFADD 

Pq, Qq Pq, Qq 
A PFSUBR PFACC 

Pq, Qq Pq, Qq 
B PSWAPD PAVGUSB 

Pq, Qq Pq, Qq 
Cc 
D 
E 
F 
Notes: 
1. All 3DNow!™ opcodes consist of two OFh bytes. This table shows the immediate byte for 3DNow! opcodes. Rows 
show the high nibble of the immediate byte. Columns show the low nibble of the immediate byte. 
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A.1.3 x87 Encodings 


All x87 instructions begin with an opcode byte in the range D8h to DFh, as shown in Table A-2 on 
page 508. These opcodes are followed by a ModRM byte that further defines the opcode. Table A-15 
shows both the opcode byte and the ModRM byte for each x87 instruction. 


There are two significant ranges for the ModRM byte for x87 opcodes: 00—BFh and CO—-FFh. When 
the value of the ModRM byte falls within the first range, OO-BFh, the opcode uses only the reg field to 
further define the opcode. When the value of the ModRM byte falls within the second range, CO—FFh, 
the opcode uses the entire ModRM byte to further define the opcode. 


Byte values shown as reserved or invalid in Table A-15 have implementation-specific functions, 
which can include an invalid-opcode exception. 


The basic instructions FNSTENV, FNSTCW, FNCLEX, FNINIT, FNSAVE, FNSTSW, and FNSTSW 
do not check for possible floating point exceptions before operating. Utility versions of these 
mnemonics are provided that insert an FWAIT (opcode 9B) before the corresponding non-waiting 
instruction. These are FSTENV, FSTCW, FCLEX, FINIT, FSAVE, and FSTSW. For further 
information on wait and non-waiting versions of these instructions, see their corresponding pages in 
Volume 5. 
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Table A-15. x87 Opcodes and ModRM Extensions 
ModRM ModRM reg Field 
Opcode mod 
Field 10 I1 12 I3 14 I5 16 I7 
00-BF 
11 FADD FMUL FCOM FCOMP FSUB FSUBR FDIV FDIVR 
spe mem32real | mem32real | mem32real | mem32real a mem32real | mem32real 
co C8 DO D8 EO E8 FO F8 
FADD FMUL FCOM FCOMP FSUB FSUBR FDIV FDIVR 
ST(0), ST(0), 
ST(0) ST(0), ST(0) | ST(0), ST(0) | ST(0), ST(0) | ST(0), ST(0) ST(0) ST(0), ST(0) | ST(0), ST(0) 
C1 c9 D1 D9 E1 E9 F1 F9 
FADD FMUL FCOM FCOMP FSUB FSUBR FDIV FDIVR 
ST(0), ST(0), 
ST(1) ST(0), ST(1) | ST(0), ST(1) | ST(0), ST(1)| ST(0), ST(1) ST(1) ST(0), ST(1) | ST(0), ST(1) 
C2 CA D2 DA E2 EA F2 FA 
FADD FMUL FCOM FCOMP FSUB FSUBR FDIV FDIVR 
ST(0), ST(0), 
ST(2) ST(0), ST(2) | ST(0), ST(2) | ST(0), ST(2) | ST(0), ST(2) ST(2) ST(0), ST(2) | ST(0), ST(2) 
C3 CB D3 DB E3 EB F3 FB 
D8 FADD FMUL FCOM FCOMP FSUB FSUBR FDIV FDIVR 
ant); ST(0), ST(3) | ST(0), ST(3) | ST(0), ST(3) | ST(0), ST(3) S10); ST(0), ST(3) | ST(0), ST(3) 
ia ST(3) ST(3) 
C4 cc D4 DC E4 EC F4 FC 
FADD FMUL FCOM FCOMP FSUB FSUBR FDIV FDIVR 
ST(0), ST(0), 
ST(4) ST(0), ST(4) | ST(0), ST(4) | ST(0), ST(4) | ST(0), ST(4) ST(4) ST(0), ST(4) | ST(0), ST(4) 
C5 CD D5 DD E5 ED F5 FD 
FADD FMUL FCOM FCOMP FSUB FSUBR FDIV FDIVR 
ST(0), ST(0), 
ST(5) ST(0), ST(5) | ST(0), ST(5) | ST(0), ST(5) | ST(0), ST(5) ST(5) ST(0), ST(5) | ST(0), ST(5) 
C6 CE D6 DE E6 EE F6 FE 
FADD FMUL FCOM FCOMP FSUB FSUBR FDIV FDIVR 
ST(0), ST(0), 
ST(6) ST(0), ST(6) | ST(0), ST(6) | ST(0), ST(6) | ST(0), ST(6) ST(6) ST(0), ST(6) | ST(0), ST(6) 
C7 CF D7 DF E7 EF F7 FF 
FADD FMUL FCOM FCOMP FSUB FSUBR FDIV FDIVR 
ST(0), ST(0), 
ST(7) ST(0), ST(7) | ST(0), ST(7) | ST(0), ST(7) | ST(0), ST(7) ST(7) ST(0), ST(7) | ST(0), ST(7) 
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Table A-15. 
ModRM 
Opcode mod 
Field 
111 
D9 
11 
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ModRM reg Field 
10 11 12 I3 I4 15 16 I7 
00-BF 
FLD FST FSTP FLDENV FLDCW FNSTENV FNSTCW 
mem32- mem32real | mem32real memagesen memi6 memasenen mem1i6 
real Vv Vv 
Co C8 DO D8 EO E8 FO F8 
FLD FXCH FNOP reserved FCHS FLD1 F2XM1 FPREM 
ST(0), 
ST(0) ST(0), ST(0) 
C1 C9 D1 D9 E1 E9 F1 F9 
FLD FXCH invalid reserved FABS FLDL2T FYL2X FYL2XP1 
ST(0), 
ST(1) ST(0), ST(1) 
C2 CA D2 DA E2 EA F2 FA 
FLD FXCH invalid reserved invalid FLDL2E FPTAN FSQRT 
ST(0), 
ST(2) ST(0), ST(2) 
C3 CB D3 DB E3 EB F3 FB 
FLD FXCH invalid reserved invalid FLDPI FPATAN FSINCOS 
ST(0), 
ST(3) ST(0), ST(3) 
C4 CC D4 DC E4 EC F4 FC 
FLD FXCH invalid reserved FTST FLDLG2 FXTRACT FRNDINT 
ST(0), 
ST(4) ST(0), ST(4) 
C5 CD D5 DD E5 ED F5 FD 
FLD FXCH invalid reserved FXAM FLDLN2 FPREM1 FSCALE 
ST(0), 
ST(5) ST(0), ST(5) 
C6 CE D6 DE E6 EE F6 FE 
FLD FXCH invalid reserved invalid FLDZ FDECSTP FSIN 
ST(0), 
ST(6) ST(0), ST(6) 
C7 CF D7 DF E7 EF F7 FF 
FLD FXCH invalid reserved invalid invalid FINCSTP FCOS 
ST(0), 
ST(7) ST(0), ST(7) 
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Table A-15. x87 Opcodes and ModRM Extensions (continued) 
ModRM ModRM reg Field 
Opcode mod 
Field 10 11 12 I3 I4 I5 16 I7 
00-BF 
11 FIADD FIMUL FICOM FICOMP FISUB FISUBR FIDIV FIDIVR 
mem32int | mem32int | mem32int | mem32int mem32int | mem32int | mem32int mem32int 
Co C8 DO D8 EO E8 FO F8 
FCMOVB | FCMOVE | FCMOVBE | FCMOVU invalid invalid invalid invalid 
ST(0), 
ST(0) ST(0), ST(0) | ST(0), ST(0) | ST(0), ST(0) 
C1 C9 D1 D9 E1 E9 F1 F9 
FCMOVB | FCMOVE | FCMOVBE | FCMOVU invalid FUCOMPP invalid invalid 
ST(0), 
ST(1) ST(0), ST(1) | ST(0), ST(1) | ST(0), ST(1) 
C2 CA D2 DA E2 EA F2 FA 
FCMOVB | FCMOVE | FCMOVBE | FCMOVU invalid invalid invalid invalid 
ST(0), 
ST(2) ST(0), ST(2) | ST(0), ST(2) | ST(0), ST(2) 
C3 CB D3 DB E3 EB F3 FB 
FCMOVB | FCMOVE | FCMOVBE | FCMOVU invalid invalid invalid invalid 
DA ST(0), 
ST(3) ST(0), ST(3) | ST(0), ST(3) | ST(0), ST(3) 
11 
C4 CC D4 DC E4 EC F4 FC 
FCMOVB | FCMOVE | FCMOVBE | FCMOVU invalid invalid invalid invalid 
a ST(0), ST(4) | ST(0), ST(4) | ST(0), ST(4) 
C5 CD D5 DD E5 ED F5 FD 
FCMOVB | FCMOVE | FCMOVBE | FCMOVU invalid invalid invalid invalid 
a ST(0), ST(5) | ST(0), ST(5) | ST(0), ST(5) 
C6 CE D6 DE E6 EE F6 FE 
FCMOVB | FCMOVE | FCMOVBE | FCMOVU invalid invalid invalid invalid 
ST(0), 
ST(6) ST(0), ST(6) | ST(0), ST(6) | ST(0), ST(6) 
C7 CF D7 DF E7 EF F7 FF 
FCMOVB | FCMOVE | FCMOVBE | FCMOVU invalid invalid invalid invalid 
ST(0), 
ST(7) ST(0), ST(7) | ST(0), ST(7) | ST(0), ST(7) 
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Table A-15. 
ModRM 
Opcode mod 
Field 
111 
DB 
11 


ModRM reg Field 
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10 11 12 I3 14 15 16 I7 
00-BF 

FILD FISTTP FIST FISTP invalid FLD invalid FSTP 
mem32int | mem32int | mem32int | mem32int tis mem8s0real 

Co C8 DO D8 EO E8 FO F8 
FCMOVNB | FCMOVNE i a FCMOVNU reserved FUCOMI FCOMI invalid 

ST(0), ST(0), 

ST(0) ST(0), ST(0) | ST(0), ST(0) | ST(0), ST(0) ST(0) ST(0), ST(0) 

C1 c9 D1 D9 E1 E9 F1 F9 
FCMOVNB | FCMOVNE eae. FCMOVNU reserved FUCOMI FCOMI invalid 

ST(0), ST(0), 

ST(1) ST(0), ST(1) | ST(0), ST(1) | ST(0), ST(1) ST(1) ST(0), ST(1) 

C2 CA D2 DA E2 EA F2 FA 
FCMOVNB | FCMOVNE ae FCMOVNU FNCLEX FUCOMI FCOMI invalid 

ST(0), ST(0), 

ST(2) ST(0), ST(2) | ST(0), ST(2) | ST(0), ST(2) ST(2) ST(0), ST(2) 

C3 CB D3 DB E3 EB F3 FB 
FCMOVNB | FCMOVNE pea FCMOVNU FNINIT FUCOMI FCOMI invalid 

ST(0), ST(0), 

ST(3) ST(0), ST(3) | ST(0), ST(3) | ST(0), ST(3) ST(3) ST(0), ST(3) 

C4 CC D4 DC E4 EC F4 FC 
FCMOVNB | FCMOVNE a FCMOVNU reserved FUCOMI FCOMI invalid 

ST(0), ST(0), 

ST(4) ST(0), ST(4) | ST(0), ST(4) | ST(0), ST(4) ST(4) ST(0), ST(4) 

C5 CD D5 DD E5 ED F5 FD 
FCMOVNB | FCMOVNE ae FCMOVNU invalid FUCOMI FCOMI invalid 

ST(0), ST(0), 

ST(5) ST(0), ST(5) | ST(0), ST(5) | ST(0), ST(5) ST(5) ST(0), ST(5) 

C6 CE D6 DE E6 EE F6 FE 
FCMOVNB | FCMOVNE ai FCMOVNU invalid FUCOMI FCOMI invalid 

ST(0), ST(0), 

ST(6) ST(0), ST(6) | ST(0), ST(6) | ST(0), ST(6) ST(6) ST(0), ST(6) 

C7 CF D7 DF E7 EF F7 FF 
FCMOVNB | FCMOVNE ae FCMOVNU invalid FUCOMI FCOMI invalid 

ST(0), ST(0), 

ST(7) ST(0), ST(7) | ST(0), ST(7) | ST(0), ST(7) ST(7) ST(0), ST(7) 
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Table A-15. x87 Opcodes and ModRM Extensions (continued) 
ModRM ModR\M reg Field 
Opcode mod 
Field 10 11 12 I3 I4 I5 16 I7 
00-BF 
14 FADD FMUL FCOM FCOMP FSUB FSUBR FDIV FDIVR 
eles memé64real | mem6é4real | mem64real | mem64real ee memé64real | mem64real 
Co C8 DO D8 EO E8 FO F8 
FADD FMUL reserved reserved FSUBR FSUB FDIVR FDIV 
ST(0), ST(0), 
ST(0) ST(0), ST(0) ST(0), ST(0) ST(0) ST(0), ST(0) | ST(0), ST(0) 
C1 C9 D1 D9 E1 E9 F1 F9 
FADD FMUL reserved reserved FSUBR FSUB FDIVR FDIV 
ST(1), ST(1), 
ST(0) ST(1), ST(0) ST(1), ST(0) ST(0) ST(1), ST(0) | ST(1), ST(0) 
C2 CA D2 DA E2 EA F2 FA 
FADD FMUL reserved reserved FSUBR FSUB FDIVR FDIV 
ST(2), ST(2), 
ST(0) ST(2), ST(0) ST(2), ST(0) ST(0) ST(2), ST(0) | ST(2), ST(0) 
C3 CB D3 DB E3 EB F3 FB 
pc FADD FMUL reserved reserved FSUBR FSUB FDIVR FDIV 
ST(3), ST(3), 
‘i ST(0) ST(3), ST(0) ST(3), ST(0) ST(0) ST(3), ST(0) | ST(3), ST(0) 
C4 CC D4 DC E4 EC F4 FC 
FADD FMUL reserved reserved FSUBR FSUB FDIVR FDIV 
ST(4), ST(4), 
ST(0) ST(4), ST(0) ST(4), ST(0) ST(0) ST(4), ST(0) | ST(4), ST(0) 
C5 CD D5 DD E5 ED F5 FD 
FADD FMUL reserved reserved FSUBR FSUB FDIVR FDIV 
ST(5), ST(5), 
ST(0) ST(5), ST(0) ST(5), ST(0) ST(0) ST(5), ST(0) | ST(5), ST(0) 
C6 CE D6 DE E6 EE F6 FE 
FADD FMUL reserved reserved FSUBR FSUB FDIVR FDIV 
ST(6), ST(6), 
ST(0) ST(6), ST(0) ST(6), ST(0) ST(0) ST(6), ST(0) | ST(6), ST(0) 
C7 CF D7 DF E7 EF F7 FF 
FADD FMUL reserved reserved FSUBR FSUB FDIVR FDIV 
ST(7), ST(7), 
ST(0) ST(7), ST(0) ST(7), ST(0) ST(0) ST(7), ST(0) | ST(7), ST(0) 
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x87 Opcodes and ModRM Extensions (continued) 


Table A-15. 
ModRM 
Opcode mod 
Field 
111 
DD 
11 


ModRM reg Field 
10 I1 12 I3 I4 I5 I6 I7 
00-BF 
FLD FISTTP FST FSTP FRSTOR invalid FNSAVE FNSTSW 
menibs: memé64int | mem64real | mem6é4real memanit 03s memoatoee mem16 
real nv nv 
co C8 DO D8 EO E8& FO F8 
FFREE reserved FST FSTP FUCOM FUCOMP invalid invalid 
ST(0) ST(0) ST(0) ST(0), ST(0) ST(0) 
C1 c9 D1 D9 E1 E9 F1 F9 
FFREE reserved FST FSTP FUCOM FUCOMP invalid invalid 
ST(1) ST(1) ST(1) ST(1), ST(0) ST(1) 
C2 CA D2 DA E2 EA F2 FA 
FFREE reserved FST FSTP FUCOM FUCOMP invalid invalid 
ST(2) ST(2) ST(2) ST(2), ST(0) ST(2) 
C3 CB D3 DB E3 EB F3 FB 
FFREE reserved FST FSTP FUCOM FUCOMP invalid invalid 
ST(3) ST(3) ST(3) ST(3), ST(0) ST(3) 
C4 cc D4 DC E4 EC F4 FC 
FFREE reserved FST FSTP FUCOM FUCOMP invalid invalid 
ST(4) ST(4) ST(4) ST(4), ST(0) ST(4) 
C5 CD D5 DD E5 ED F5 FD 
FFREE reserved FST FSTP FUCOM FUCOMP invalid invalid 
ST(5) ST(5) ST(5) ST(5), ST(0) ST(5) 
C6 CE D6 DE E6 EE F6 FE 
FFREE reserved FST FSTP FUCOM FUCOMP invalid invalid 
ST(6) ST(6) ST(6) ST(6), ST(0) ST(6) 
C7 CF D7 DF E7 EF F7 FF 
FFREE reserved FST FSTP FUCOM FUCOMP invalid invalid 
ST(7) ST(7) ST(7) ST(7), ST(0) ST(7) 
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Table A-15. x87 Opcodes and ModRM Extensions (continued) 
ModRM ModR\M reg Field 
Opcode mod 
Field 10 I1 12 I3 14 I5 I6 I7 
00-BF 
111 FIADD FIMUL FICOM FICOMP FISUB FISUBR FIDIV FIDIVR 
mem16int | mem16int | mem16int | mem16int mem16int | memi6int | mem16int mem16int 
co C8 DO D8 EO E8 FO F8 
FADDP FMULP reserved invalid FSUBRP FSUBP FDIVRP FDIVP 
ST(0), ST(0), 
a ST(0), ST(0) ST(0), ST(0) a ST(0), ST(0) | ST(0), ST(0) 
C1 cg D1 D9 E1 E9 F1 F9 
FADDP FMULP reserved | FCOMPP FSUBRP FSUBP FDIVRP FDIVP 
ST(1), ST(1), 
as ST(1), ST(0) ST(1), ST(0) a ST(1), ST(0) | ST(1), ST(0) 
C2 CA D2 DA E2 EA F2 FA 
FADDP FMULP reserved invalid FSUBRP FSUBP FDIVRP FDIVP 
ST(2), ST(2), 
ae ST(2), ST(0) ST(2), ST(0) a ST(2), ST(0) | ST(2), ST(0) 
C3 CB D3 DB E3 EB F3 FB 
FADDP FMULP reserved invalid FSUBRP FSUBP FDIVRP FDIVP 
DE ST(3 ST(3 
a a ST(3), ST(0) ST(3), ST(0) ae ST(3), ST(0) | ST(3), ST(0) 
11 
C4 cc D4 DC E4 EC F4 FC 
FADDP FMULP reserved invalid FSUBRP FSUBP FDIVRP FDIVP 
ST(4), ST(4), 
mt ST(4), ST(0) ST(4), ST(0) a ST(4), ST(0) | ST(4), ST(0) 
C5 CD D5 DD E5 ED F5 FD 
FADDP FMULP reserved invalid FSUBRP FSUBP FDIVRP FDIVP 
ST(5), ST(5), 
orl ST(5), ST(0) ST(5), ST(0) ae ST(5), ST(0) | ST(5), ST(0) 
C6 CE D6 DE E6 EE F6 FE 
FADDP FMULP reserved invalid FSUBRP FSUBP FDIVRP FDIVP 
ST(6), ST(6), 
aes ST(6), ST(0) ST(6), ST(0) ate ST(6), ST(0) | ST(6), ST(0) 
C7 CF D7 DF E7 EF F7 FF 
FADDP FMULP reserved invalid FSUBRP FSUBP FDIVRP FDIVP 
ST(7), ST(?), 
ai ST(7), ST(0) ST(7), ST(0) ao) ST(7), ST(0) | ST(7), ST(0) 
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Table A-15. 
ModRM 
Opcode mod 
Field 
111 
DF 
11 


ModRM reg Field 


x87 Opcodes and ModRM Extensions (continued) 


AMD64 Technology 


10 11 12 I3 14 15 16 7 
00-BF 
FILD FISTTP FIST FISTP FBLD FILD FBSTP FISTP 
memi16int | mem16int | mem16int | mem16int | mem80dec | memé4int | mem8Odec | memé4int 
Co cs DO D8 EO E8 FO F8 
reserved reserved reserved reserved FNSTSW FUCOMIP FCOMIP invalid 
ST(0), 
AX ST(0) ST(0), ST(0) 
C1 cg D1 D9 E1 E9 F1 F9 
reserved reserved reserved reserved invalid FUCOMIP FCOMIP invalid 
ST(0), 
ST(1) ST(0), ST(1) 
C2 CA D2 DA E2 EA F2 FA 
reserved reserved reserved reserved invalid FUCOMIP FCOMIP invalid 
ST(0), 
ST(2) ST(0), ST(2) 
C3 CB D3 DB E3 EB F3 FB 
reserved reserved reserved reserved invalid FUCOMIP FCOMIP invalid 
ST(0), 
ST(3) ST(0), ST(3) 
C4 CC D4 DC E4 EC F4 FC 
reserved reserved reserved reserved invalid FUCOMIP FCOMIP invalid 
ST(0), 
ST(4) ST(0), ST(4) 
C5 CD D5 DD E5 ED F5 FD 
reserved reserved reserved reserved invalid FUCOMIP FCOMIP invalid 
ST(0), 
ST(5) ST(0), ST(5) 
C6 CE D6 DE E6 EE F6 FE 
reserved reserved reserved reserved invalid FUCOMIP FCOMIP invalid 
ST(0), 
ST(6) ST(0), ST(6) 
C7 CF D7 DF E7 EF F7 FF 
reserved reserved reserved reserved invalid FUCOMIP FCOMIP invalid 
ST(0), 
ST(7) ST(0), ST(7) 
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A.1.4 rFLAGS Condition Codes for x87 Opcodes 


Table A-16 shows the rFLAGS condition codes specified by the opcode and ModRM bytes of the 


FCMOVcc instructions. 
Table A-16. rFLAGS Condition Codes for FCMOVcc 
Opcode ModRM | ModRM 7 
(hex) mod reg rFLAGS Value | cc Mnemonic Condition 
Field Field 
000 CF=1 B Below 
ae 001 ZF=1 E Equal 
010 CF=1orZF=1 |BE Below or Equal 
‘a 011 PF=1 U Unordered 
000 CF=0 NB Not Below 
ae 001 ZF=0 NE Not Equal 
010 CF =0 and ZF=0 |NBE Not Below or Equal 
011 PF=0 NU Not Unordered 


A.1.5 Extended Instruction Opcode Maps 


The following sections present the VEX and the XOP extended instruction opcode maps. The 


VEX.map_select field of the three-byte VEX encoding escape sequence selects VEX opcode maps: 
O1h, 02h, or 03h. The two-byte VEX encoding escape sequence implicitly selects the VEX map 01h. 


The XOP.map_select field selects between the three XOP maps: 08h, 09h or OAh. 


VEX Opcode Maps. Tables A-17 — A-23 below present the VEX opcode maps and Table A-24 on 
page 542 presents the VEX opcode groups. 
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Table A-17. VEX Opcode Map 1, Low Nibble = [0h:7h] 


AMD64 Technology 


Opcode 
00 


1x 


2x-4x 


5x 


6x 


7x 


8x-Bx 


cx 


Note 1: 


Note 2: 
Note 3: 


x0 x1 
VMOvuPs? VMovuPs? VMOVLPS VMOVLPS VUNPCKLPS” VUNPCKHPS? VMOVHPS VMOVHPS 
Vpsx, Wpsx Wpsx, Vpsx Vps, Hps, Mq Ma, Vps Vpsx, Hpsx, Wpsx | Vpsx, Hpsx, Wpsx Vps, Hps, Mq Ma, Vps 
VMOVHLPS VMOVLHPS 
Vps, Hps, Ups Vps, Hps, Ups 
VMOVUPD? VMOVUPD? VMOVLPD VMOVLPD VUNPCKLPD? VUNPCKHPD? VMOVHPD VMOVHPD 
Vpdx, Wpdx Wpdkx, Vpdx Vo, Ho, Mq Mq, Vo Vpdx, Hpdx, Wpdx | Vpdx, Hpdx, Wpdx Vpd, Hpd, Mq Mq, Vpd 
vMovss® VMOVSSs® VMOVSLDUP* VMOVSHDUP? 
Vss, Md Md, Vss Vpsx, Wpsx Vpsx, Wpsx 
VMOVSS VMOVSS 
Vss, Hss, Uss Uss, Hss, Vss 
VMovsD* VMOVSD®* vmMovobuP 
Vsd, Mq Mq, Vsd Vo, Wq (L=0) 
VMOVSD VMOVSD Vdo, Wdo (L=1) 


Vsd, Hsd, Usd 


Usd, Hsd, Vsd 


VMOVMSKPS? VSQRTPS* VRSQRTPS* VRCPPS? VANDPS? VANDNPS? VoRPS? VXORPS? 
Gy, Upsx Vpsx, Wpsx Vpsx, Wpsx Vpsx, Wpsx Vpsx, Hpsx, Wpsx | Vpsx, Hpsx, Wpsx | Vpsx, Hpsx, Wpsx | Vpsx, Hpsx, Wpsx 
VMOVMSKPD? VSQRTPD? VANDPD? VANDNPD? VORPD* VXORPD? 
Gy, Updx Vpdx, Wpdx Vpdx, Hpdx, Wpdx | Vpdx, Hpdx, Wpdx | Vpdx, Hpdx, Wpdx | Vpdx, Hpdx, Wpdx 
VSQRTSS® VRSQRTSS* VRCPSS® 
Vo, Ho, Wss Vo, Ho, Wss Vo, Ho, Wss 
VSQRTSD* 
Vo, Ho, Wsd 
VPUNPCKLBW? VPUNPCKLWD* | VPUNPCKLDQ? VPACKSSWB* VPCMPGTB* VPCMPGTW? VPCMPGTD? VPACKUSWB? 


Vpbx, Hpbx, Wpbx 


Vpwx, Hpwx, Wpwx| 


Vpdwx, Hpdwx, 
Wpdwx 


Vpkx, Hpix, Wpix 


Vpbx, Hpkx, Wpkx | Vpwx, Hpix, Wpix 


Vpdwx, Hpjx, Wpjx 


Vpkx, Hpix, Wpix 


VZEROUPPER (L=0) 


VZEROALL (L=1) 


VPSHUFD? 
Vpdwx, Wpdwx, Ib 


VEX group #12 


VEX group #13 


VEX group #14 


VPCMPEQB 
Vpbx, Hpkx, Wpkx 


VPCMPEQW* 
Vpwx, Hpix, Wpix 


VPCMPEQD* 
Vpdwx, Hpjx, Wpjx 


VPSHUFHW? 
Vpwx, Wpwx, Ib 


VPSHUFLW? 
Vpwx, Wpwx, Ib 


VCMPccPS* 
Vpdw, Hps, Wps, 

Ib 
VCMPccPD* 
Vpqw, Hpd, Wpd, 
Ib 


VPINSRW. 
Vpw, Hpw, Mw, Ib 
Vpw, Hpw, Rd, Ib 


VPEXTRW. 
Gw, Upw, Ib 


VSHUFPS* 
Vpsx, Hpsx, Wpsx, 

Ib 
VSHUFPD? 
Vpdx, Hpdx, Wpdx, 
Ib 


VCMPccSS* 
Vd, Hss, Wss, Ib 


VCMPccSD* 
Vq, Hsd, Wsd, Ib 


The condition codes are: EQ, LT, LE, UNORD, NEQ, NLT, NLE, and ORD; encoded as [00:07h] using Ib. 
VEX encoding adds: EQ_UQ, NGE, NGT, FALSE, NEQ_OQ, GE, GT, TRUE [08:0Fh]; 

EQ_OS, LT_OQ, LE_OQ, UNORD_S, NEQ_US, NLT_UQ, NLE_UQ, ORD_S [10h:17h]; and 
EQ_US, NGE_UQ, NGT_UQ, FALSE_OS, NEQ_OS, GE_OQ, GT_OQ, TRUE_US [18:1Fh]. 
Supports both 128 bit and 256 bit vector sizes. Vector size is specified using the VEX.L bit. When L = 0, size is 128 bits; when L = 1, size is 256 bits. 
Operands are scalars. VEX.L bit is ignored. 
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Table A-18. VEX Opcode Map 1, Low Nibble = [0h:7h] Continued 
x1 x2 x3 x4 x5 


/EX.pp|Opcode x0 


00 
VADDSUBPD* VPSRLW? VPSRLD* VPSRLQ? VPADDQ? VPMULLW? VPMOVMSKB? 
01 Vpdx, Hpdx, Wpdx | Vpwx, Hpwx, Wx |Vpdwx, Hpdwx, Wx] Vpqwx, Hpqwx, Wx} Vpq, Hpq, Wpq Vpix, Hpix, Wpix Gy, Upbx 
Dx 
10 


VADDSUBPS" 
Vpsx, Hpsx, Wpsx 


VPAVGB* VPSRAW? VPSRAD? VPAVGW? VPMULHUW? VPMULHW VCVTTPD2DQ° VMOVNTDQ 
01 Vpkx, Hpkx, Wpkx | Vpwx, Hpwx, Wx |Vpdwx, Hpdwx, Wx] Vpix, Hpix, Wpix Vpi, Hpi, Wpi Vpi, Hpi, Wpi Vpjx, Wpdx Mo, Vo (L=0) 
Mdo, Vdo (L=1) 


VCVTDQ2PD* 
Vpdx, Wpjx 


VCVTPD2DQ? 
Vpjx, Wpdx 


VPSLLW? VPSLLD* VPSLLO? VPMULUDQ? VPMADDWD* VPSADBW7 VMASKMOVDQU 
Vpwx, Hpwx, Wo.qx| Vpdwx, Hpdwx, Vpqwx, Hpqwx, Vpqx, Hpjx, Wpjx | Vpjx, Hpix, Wpix | Vpix, Hpkx, Wpkx Vpb, Upb 
Wo.qx Wo.qx 


VLDDQU 
Vo, Mo (L=0) 
Vdo, Mdo (L=1) 
The condition codes are: EQ, LT, LE, UNORD, NEQ, NLT, NLE, and ORD; encoded as [00:07h] using Ib. 
VEX encoding adds: EQ_UQ, NGE, NGT, FALSE, NEQ_OQ, GE, GT, TRUE [08:0Fh]; 
EQ_OS, LT_OQ, LE_OQ, UNORD_S, NEQ_US, NLT_UQ, NLE_UQ, ORD_S [10h:17h]; and 
EQ_US, NGE_UQ, NGT_UQ, FALSE_OS, NEQ_OS, GE_OQ, GT_OQ, TRUE_US [18:1Fh]. 
Supports both 128 bit and 256 bit vector sizes. Vector size is specified using the VEX.L bit. When L = 0, size is 128 bits; when L = 1, size is 256 bits. 
Operands are scalars. VEX.L bit is ignored. 


AL: 
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Table A-19. VEX Opcode Map 1, Low Nibble = [8h:Fh] 


Supports both 128 bit and 256 bit vector sizes. Vector size is specified using the VEX.L bit. When L = 0, size is 128 bits; when L = 


Operands are scalars. VEX.L bit is ignored. 


VEX.pp| Opcode x8 x9 xA xB xC 
eee Ox-1x 50.0 
VMOVAPS* VMOVAPS* VMOVNTPS* VUCOMISS* VCOMISs? 
00 Vpsx, Wpsx Wpsx, Vpsx Mpsx, Vpsx Vss, Wss Vss, Wss 
VMOVAPD* VMOVAPD* VMOVNTPD* VUCOMISD? VCOMISD? 
o1 Vpdx, Wpdx Wpdx, Vpdx Mpdx, Vpdx Vsd, Wsd Vsd, Wsd 
2x 2 J 2 
VCVTSI2SS' VCVTTSS2SI VCVTSS2SI 
10 Vo, Ho, Ey Gy, Wss Gy, Wss 
VCVTSI2SD* VcvTTsD2si vcvTsp2si 
11 Vo, Ho, Ey Gy, Wsd Gy, Wsd 
eee | 3x-4x eee 
VADDPS* VMULPS* VCVTPS2PD* VCVTDQ2Ps* VSUBPS* VMINPS* vpIvps* VMAXPS* 
00 Vpsx, Hpsx, Wpsx | Vpsx, Hpsx, Wpsx Vpdx, Wpsx Vpsx, Wpjx Vpsx, Hpsx, Wpsx | Vpsx, Hpsx, Wpsx | Vpsx, Hpsx, Wpsx | Vpsx, Hpsx, Wpsx 
VADDPD* VMULPD* VCVTPD2PS* VCVTPS2DQ* VSUBPD* VMINPD* VDIVPD* VMAXPD* 
01 Vpdx, Hpdx, Wpdx | Vpdx, Hpdx, Wpdx Vpsx, Wpdx Vpjx, Wpsx Vpdx, Hpdx, Wpdx | Vpdx, Hpdx, Wpdx | Vpdx, Hpdx, Wpdx | Vpdx, Hpdx, Wpdx 
5x 2 2 2 1 2 2 2 2 
VADDSS' VMULSS VCVTSS2SD VCVTTPS2DQ VSUBSS' VMINSS' VDIVSS VMAXSS 
10 Vss, Hss, Wss Vss, Hss, Wss Vo, Ho, Wss Vpjx, Wpsx Vss, Hss, Wss Vss, Hss, Wss Vss, Hss, Wss Vss, Hss, Wss 
VADDSD VMULSD* VCVTSD2sS* VSUBSD* VMINSD? VDIVsD? VMAXSD* 
11 Vsd, Hsd, Wsd Vsd, Hsd, Wsd Vo, Ho, Wsd Vsd, Hsd, Wsd Vsd, Hsd, Wsd Vsd, Hsd, Wsd Vsd, Hsd, Wsd 
00 
VPUNPCKHBW* VPUNPCKHWD* VPUNPCKHDQ* VPACKSSDW" VPUNPCKLQDQ"* VPUNPCKHQDQ"* VMOVD VMOVQ VMOVDQAt 
01 Vpbx, Hpbx, Wpbx |Vpwx, Hpwx, Wpwx| Vpdwx, Hpdwx, Vpix, Hpjx, Wpjx Vpqwx, Hpqwx, Vpqwx, Hpqwx, Vo, Ey Vpqwx, Wpqwx 
6x Wpdwx Wpqwx Wpqwx (VEX.L=0) 
VMOVDQU’ 
10 Vpqwx, Wpqwx 
st, 
00 
VHADDPD! VHSUBPD* VMOVD VMOVQ VMOVDQA* 
o1 Vpdx, Hpdx, Wpdx | Vpdx, Hpdx, Wpdx Ey, Vo Wpqwx, Vpqwx 
(VEX.L=1) 
7x VMOVQ VMOVDQU* 
10 Va, Wq Wpqwx, Vpqwx 
(VEX.L=0) 
VHADDPS* VHSUBPS* 
11 Vpsx, Hpsx, Wpsx | Vpsx, Hpsx, Wpsx 
alae = aa ES 
VEX group #15 
n/a Ax 
BX-CX —SSSS_—ESSS_ ESSSsas 
00 
Dx VPSUBUSB* VPSUBUSW* VPMINUB* VPAND* VPADDUSB* VPADDUSW* VPMAXUB* VPANDN* 
01 Vpkx, Hpkx, Wpkx | Vpix, Hpix, Wpix | Vpkx, Hpkx, Wpkx Vx, Hx, Wx Vpkx, Hpkx, Wpkx | Vpix, Hpix, Wpix | Vpkx, Hpkx, Wpkx Vx, Hx, Wx 
00 
Ex VPSUBSB* VPSUBSW"* VPMINSW* VPOR* VPADDSB* VPADDSW* VPMAXSW* 
o1 Vpkx, Hpkx, Wpkx | Vpix, Hpix, Wpix Vpix, Hpix, Wpix Vx, Hx, Wx Vpkx, Hpkx, Wpkx | Vpix, Hpix, Wpix Vpix, Hpix, Wpix 
Fx VPSUBB* VPSUBW* VPSUBD* VPSUBQ* VPADDB* VPADDW"* VPADDD* 
01 Vpkx, Hpkx, Wpkx | Vpix, Hpix, Wpix Vpxj, Hpjx, Wpjx | Vpqx, Hpqx, Wpgx | Vpkx, Hpkx, Wpkx | Vpix, Hpix, Wpix Vpjx, Hpjx, Wpjx 


ize is 256 bits. 
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Table A-20. VEX Opcode Map 2, Low Nibble = [0h:7h] 


VPSHUFB" 
Vpbx, Hpbx, Wpbx 


VPMOVSXBW" 
Vpix, Wpkx 


VPMOVZXBW" 
Vpix, Wpkx 


VPMULLD* 
Vpix, Hpjx, Wpxj 


°VPGATHERD- 
D' Vx, M*d, Hpdw (W=0) 
a’ vx, M*q, Hpqwx (W=1) 


VPHADDW" 
Vpix, Hpix, Wpix 


VPMOVSXBD" 
Vpjx, Wpkx 


VPMOVZXBD" 
Vpjx, Wpkx 


VPHMINPOSUW 
Vo, Wpi 


°VPGATHERQ- 
D' Vx, M*d, Hpdw (W=0) 
Qt Vx, M*g, Hpqw (W=1) 


VPHADDD™ 
Vpix, Hpjx, Wpjx 


VPMOVSXBQ 
Vpqx, Wpkx 


VPMOVZxBQ? 
Vpqx, Wpkx 


°VGATHERD- 
PS’ Vx,M*ps,Hpsx (\W=0) 
PD! Vx,M*pd,Hpdx (WW=1) 


Gy, By, Ey 


VPHADDSW" 
Vpix, Hpix, Wpix 


VCVTPH2PS: 
Vpsx, Wphx 


VPMOVSXWD" 
Vpjx, Wpix 


VPMOVZXWD" 
Vpjx, Wpix 


*VGATHERQ- 
PS’ Vx,M*ps,Hps (W=0) 
PD! Vx,M*pd,Hpdx (WW=1) 


VEX group #17 


VEX.pp| Opcode x0 x1 x2 x3 x4 x5 


VPMADDUBSW" 
Vpix, Hpkx, Wpkx 


vPMOVsXWwa! 
Vpax, Wpix 


vemovzxwat 
Vpax, Wpix 


x6 


VPHSUBD! 
Vpix, Hpjx, Wpjx 


VPHSUBW" 
Vpix, Hpix, Wpix 


VPERMPS 
Vps, Hd, Wps 


VvPMOVSXDQ? 
Vpgx, Wj 


VPERMD 
Vd, Hd, Wd 


VPMOVZXD 


VPSRLV- 
D' Vx, Hx, Wx 
qi Vx, Hx, Wx 


VPSRAVD" 
Vpdwx, Hpdwx, 


H: 


Gy, Ey, By 


VPHSUBSW" 
Vpix, Hpix, Wpix 


VPTEST 
Vx, Wx 


VPCMPGTQ! 
Vax, Hpax, Wpax 


VPSLLV- 
D' Vx, Hx, Wx (W=0 


>VFMSUBADD132- 
PS! Vx, Hx, Wx 
PD! Vx,Hx,Wx 
VFMSUBADD 
PS’ Vx, Hx, Wx 
PD! Vx,Hx, Wx 
VFMSUBADD231- 
PS! Vx, Hx, Wx 
PD! Vx,Hx,Wx 


BEXTR 
Gy, Ey, By 


PEXT 
Gy, By, Ey 


PDEP 
Gy, By, Ey 


MULX 
Gy, By, Ey 


Supports both 128 bit and 256 bit vector sizes. Vector size is specified using the VEX.L bit. When L = 0, size is 128 bits; when L = 1, size is 256 bits. 
For all VFMADDSUBnnnPS instructions, the data type is packed single-precision floating point. 
For all VFMADDSUBnnnPD instructions, the data type is packed double-precision floating point. 
For all VFMSUBADDnnnPS instructions, the data type is packed single-precision floating point. 
For all VFMSUBADDnnnPD instructions, the data type is packed double-precision floating point. 
Operands are treated a bit vectors. 
Uses VSIB addressing mode. 


SHLX 
Gy, Ey, By 


SARX 
Gy, Ey, By 


SHRX 
Gy, Ey, By 
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Table A-21. VEX Opcode Map 2, Low Nibble = [8h:Fh] 


VEX.pp|Opcode x8 Y og | XA 
VPSIGNB? vpsiGnw! VPSIGNO! 
01 Vpkx, Hpkx, Wpkx | Vpi, Hpi, Wpi | Vpjx, Hpjx, Wpjx 
VBROADCASTSS! | VBROADCASTSD | VBROADCASTF128 
01 1x Vps, Wss Vdo, Mo 
(VEX.L=1) 
VPMULDQ? vecmpeaqg? | vMOVNTDQA? 
01 Vpax, Hpjx , Wpjx | Vpqx, Hpqx, Wpqx | Vx, Mx 
VPMINSB! VPMINSD? vPMINUW! 
01 Vpkx, Hpkx, Wpkx | Vpjx, Hpjx, Wpjx | Vpix, Hpix, Wpix 
i 
VFMADD132- VFMADD132- ‘VEMSUB132- 
01 PS? Vx,Hx, Wx (W=0) | SS? Vo,Ho, Wd (W=0) | PS'Vx, Hx, Wx (W=0) 
PD! Vx, Hx, Wx (W=1)|SD° Vo,Ho,Wa (W=1) PD! Vx,Hx, Wx (W=1) 
VFMADD213- VFMADD213- VFMSUB213- 
01 PS" Vx,Hx, Wx (W=0} | SS” Vo,Ho, Wd (W=0) | PS? Vx,Hx, Wx (WW=0) 
PO" Vx, Hx, Wx (W=1)|SD” Vo,Ho,Wq (W=1) |PD' Vx, Hx, Wx (W=1) 
VFMADD231- VFMADD231- VFMSUB231- 
01 PS! Vx,Hx, Wx (W=0) 5s? Vo,Ho,Wd (W=0) PS!Vx,Hx, Wx (W=0) 
PO! Vx,Hx,Wx (W=1}|SD” Vo,Ho, Wq (W=1) PD! Vx,Hx,Wx (W=1) 


Operands are scalars. VEX.L bit is ignored. 


xB 


VPMULHRSW! 
Vpix, Hpix, Wpix 


VPACKUSDW! 
Vpix, Hpjx, Wpjx 


VPMINUD! 
Vpjx, Hpjx, Wpjx 


VFMSUB132- 
$5” Vo,Ho, Wd (W=0) 
SD’ Vo,Ho, Wa (W=1) 
VFMSUB213- 
55’ Vo,Ho,Wd (W=0) 
SD’ Vo, Ho,Waq (W=1) 
VFMSUB231- 
$5” Vo,Ho,Wd (W=0) 
5D’ Vo,Ho, Wa (W=1) 


VAESIMC 
Vo, Wo 


xC 


VPERMILPS? 
psx, Hpsx, Wpdwx 


VPABSB 
Vpkx, Wpkx 


VMASKMOVeS? 
Vpsx, Hx, Mpsx 


VPMAXSB? 
Vpkx, Hpkx, Wpkx 


VPMASKMOV- 
D' Vx, Hx, Mx (W=0) 
a! Vx, Hx, Mx (W=1 
VFENMADD132- 
PS* Vx, Hx, Wx (W=0) 
PD! Vx,Hx,Wx (W=1) 
VFNMADD213- 
PS* Vx, Hx, Wx (W=0) 
PD! Vx,Hx, Wx (W=1) 
VENMADD231- 
PS Vx,Hx, Wx (W=0) 
PD! Vx, Hx, Wx (W=1) 


VAESENC 
Vo, Ho, Wo 


For all VFMADDnnnPS instructions, the data type is packed single-precision floating point. 
For all VFMADDnnnPD instructions, the data type is packed double-precision floating point. 
For all VFMSUBnnnPS instructions, the data type is packed single-precision floating point. 
For all VEMSUBnnnPD instructions, the data type is packed double-precision floatina point. 


AMD64 Technology 
xD xE XF 
VPERMILPD! VTESTPS? vresTPo! 
Vpdx, Hpdx, Wpqwx} = Vpsx, Wpsx Vpdx, Wpdx 
VPABSW' VPABSD’ 
Vpix, Wpix Vpjx, Wpjx 
VMASKMOVvPD! | vMASKMOVPS! | VMASKMOVPD' 
Vpdx, Hx, Mpdx Mpsx, Hx, Vpsx Mpdx, Hx, Vpdx 
VPMAXSD! vpmaxuw! veMAXuD? 
Vpxj, Hpjx, Wpjx | Vpix, Hpix, Wpix | Vpjx, Hpjx, Wpjx 
VPMASKMOV- 


O' Mx, Hx, Vx (W=0) 
Q! Mx, Hx, Vx(W=1) 
VFNMADD132- | VENMSUB132- 
55’ Vo,Ho,Wd (W=0) | PS Vx,Hx, Wx (W=0) 
5D” Vo,Ho,Wa (W=1) |PD" Vx, Hx, Wx (W=1) 
VENMADD213- | VENMSUB213- 
$5” Vo,Ho, Wd (W=0) | PS? Vx,Hx, Wx (W=0) 
SD” Vo,Ho, Wa (W=1) |PO' Vx, Hx, Wx (W=1) 
VENMADD231- | VENMSUB231- 
$8” Vo, Ho,Wd (W=0) | PS! Vx,Hx, Wx (W=0) 
5D’ Vo,Ho,Wq (W=1) |PO' Vx, Hx, Wx (W=1) 


VAESENCLAST 
Vo,Ho, Wo 


VAESDEC 
Vo, Ho, Wo 


VENMSUB132- 
5S Vo,Ho,Wd (W=0) 
SD’ Vo,Ho,Wq (W=1) 
VENMSUB213- 
5S’ Vo,Ho, Wd (W=0) 
SD” Vo, Ho,Wa (W=1) 
VENMSUB231- 
58’ Vo,Ho,Wd (W=0) 
5D’ Vo,Ho,Wa (W=1) 


VAESDECLAST 
Vo, Ho, Wo 


Supports both 128 bit and 256 bit vector sizes. Vector size is specified using the VEXL bit. When L = 0, size is 128 bits; when L = 1, size is 256 bits. 
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Table A-22. VEX Opcode Map 3, Low Nibble = [0h:7h] 


VEX.pp} Nibble x0 xl 
00 
r——] 0x 
VPERMQ VPERMPD VPBLENDD* VPERMILPS! VPERMILPD: VPERM2F128 
01 Va, Wa, Ib Vpd, Wpd, Ib Vpdwx, Hpdwx, Vpsx, Wpsx, Ib | Vpdx, Wpdx, lb | Vdo, Ho, Wo, Ib 
Wodwx, Ib (VEX.L=1) 
00 
1x VPEXTRB VPEXTRW VPEXTRD VEXTRACTPS 
O1 Mb, Vpb, |b Mw, Vpw, Ib Ed, Vpdw, Ib Mss, Vps, Ib 
VPEXTRB VPEXTRW. VPEXTRQ VEXTRACTPS 
Ry, Vpb, Ib Ry, Vpw, Ib Eq, Vpqw, Ib Rss, Vps, Ib 
00 
VPINSRD 
2x 
01 VPINSRB VINSERTPS 


Vpb, Hpb, Wb, Ib | Vps, Hps, Ups/Md, VPINSRQ 
Vpdw, Hpqw, Eq, Ib 


3x 
00 
r_— 4x 7 
VDPPS VDPPD VPCLMULQDQ VPERM21128 
01 Vpsx, Hpsx, Wpsx, | Vpd, Hpd, Wod, Ib | Vpix, Hpkx, Wpkx, Vo, Hpq, Wpaq, Ib Vo, Ho, Wo, ib 
lb 
5x 
00 
r—_ 6x 
VPCMPESTRM VPCMPESTRI VPCMPISTRM VPCMPISTRI 
01 Vo, Wo, Ib Vo, Wo, Ib Vo, Wo, Ib Vo, Wo, Ib 
7x-Ex 
10 
re RORX 
11 Gy, Ey, ib 


: Supports both 128 bit and 256 bit vector sizes. Vector size is specified using the VEX.L bit. When L=0, size is 128 bits; when L=1, size is 256 bits. 
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Table A-23. VEX Opcode Map 3, Low Nibble = [8h:Fh] 
VEX.pp|Opcode x8 x9 XA xB xC xD XE XF 
VROUNDPS’ | VROUNDPD' |  VROUNDSS VROUNDSD VBLENDPS" VBLENDPD’ | VPBLENDW' |  VPALIGNR’ 
01 Ox Vpsx, Wpsx, Ib | Vpdx, Wpdx, Ib | Vss, Hss, Wss, Ib | Vsd, Hsd, Wsd, Ib | Vpsx, Hpsx, Wpsx, | Vpdx, Hodx, Wpdx, Vpwx, Hpwx, Wowx/Vpbx, Hpbx, Wpbx, 
VINSERTF128 | VEXTRACTF128 VCVTPS2PH! 
01 | 1X | Vdo,Hdo,Wo, lb | Wo, Vdo, Ib Woh, Vps, Ib 
ooo || Sex 
VINSERTI128 VEXTRACTI128 
01 | 3x | Vdo,Hdo,Wo, lb | — Wo, Vdo, Ib 
VPERMILzz2PS’” | VPERMILzz2PD’" | VBLENDVPS’ | VBLENDVPD’ | VPBLENDVB* 
Vpsx, Hpsx, Wpsx, | Vpdx, Hpdx, Wpdx | Vpsx, Hpsx, Wpsx, |Vpdx, Hpdx, Wpdx, |Vpbx, Hpbx, Wpbx, 
01 4x Lpsx, Ib (W=0) |) Lpdk, Ib (W=0) Lpdx Lpdx Lx 
Vpsx, Hpsx, Lpsx, | Vpdx, Hpdx, Lpdx, 
Wosx, lb (W=1) | Wopdk, Ib (W=1 
VFMADDSUBPS’ VFMSUBADDPD" 
Vpsx, Lpsx, Wpsx, Vpdx, Lpdx, Wpdx, 
01 5x Hpsx (W=0) Hpdx (W=0 Hpsx (W=0) Hpdx (W=0) 
Vpsx, Lpsx, Hpsx, | Vpdx, Lpdx, Hpdx, | Vpsx, Lpsx, Hpsx, | Vpdx, Lodx, Hpdx, 
Wpsx 
VEMADDPS* VFMADDPD" VEMADDSS VEMADDSD VEMSUBSD 
Vpsx, Lpsx, Wpsx, | Vpdx, Lpdx, Wpdx, |Vss, Lss, Wss, Hss |Vsd, Lsd, Wsd, Hs) Vpsx, Lpsx, Wpsx, Vsd, Lsd, Wsd, Hsd 
01 6x Hpsx (W=0} Hpdx (W=0) W=0) W=0) Hpsx (W=0) W=0) 
Vpsx, Lpsx, Hpsx, | Vpdx, Lpdx, Hpdx, |Vss, Lss, Hss, Wss |Vsd, Lsd, Hsd, Wsd} Vpsx, Lpsx, Hpsx, Vsd, Lsd, Hsd, Wsd 
Wpsx (W=1 Wodx (W=1 Wpsx 
VENMADDPS: | VFNMADDPD! | VENMADDSS VENMADDSD | VENMSUBPS! VENMSUBSD 
Vpsx, Lpsx, Wpsx, | Vpdx, Lpdx, Wpdx, |Vss, Lss, Wss, Hss |Vsd, Lsd, Wsd, Hs) Vpsx, Lpsx, Wpsx, Vsd, Lsd, Wsd, Hsd 
01 | 7x Hpsx (W=0) Hpdx (W=0) W=0) W=0) Hpsx (W=0) W=0) 
Vpsx, Lpsx, Hpsx, | Vpdx, Lpdx, Hpdx, |Vss, Lss, Hss, Wss |Vsd, Lsd, Hsd, Wsd} Vpsx, Losx, Hpsx, Vsd, Lsd, Hsd, Wsd 
Wosx (W=1) Wodx (W=1) W=1) W=1) Wosx (W=1) W=1) 
wae [8X-CX nae 
VAESKEYGEN- 
01 Dx ASSIST 
Vo, Wo, Ib 
Ex-Fx 


Supports both 128 bit and 256 bit vector 


sizes. Vector size is specified using the VEX.L bit. When L=0, size is 128 bits; when L=1, size is 256 bits. 
The zero match codes are TD, TD (alias), MO, and MZ. They are encoded as the zzzz field of the lb, 
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Note: 


BLSR 
By, Ey 


VPSRLW* 
Hpwx, Upwx, Ib 


VPSRLD* 
Hpdwx, Updwx, Ib 


VPSRLQ* 
Hpqwx, Upqwx, Ib 


BLSMSK 
By, Ey 


ModRM Byte 


VPSRLDQ? 
Hpbx, Upbx, Ib 


xX100xxx 
VPSRAW* 
Hpwx, Upwx, Ib 


VPSRAD* 
Hpdwx, Updwx, Ib 


xx101xxx 


xXX110xxx 
VPSLLW* 
Hpwx, Upwx, Ib 


VPSLLD* 
Hpdwx, Updwx, Ib 


VPSLLQ* 
Hpqwx, Upqwx, Ib 


xX111xxx 


VPSLLDQ* 
Hpbx, Upbx, Ib 


BLSI 
By, Ey 


1. Supports both 128 bit and 256 bit vector sizes. Vector size is specified using the VEX.L bit. When L=0, size is 128 bits; when L=1, size is 256 bits. 


XOP Opcode Maps. Tables A-25 — A-30 below present the XOP opcode maps and Table A-31 on 
page 544 presents the VEX opcode groups. 


Table A-25. XOP Opcode Map 8h, Low Nibble = [0h:7h] 


XOP.pp}| Opcode 


00 


00 


00 


00 


VPROTB 
00 Vo,Wo,Ib 


; 


x1 


VPROTW 
Vo,Wo,|b 


Vx,Hx,Wx,Lx (W=0) 
Vx,Hx, Lx, Wx (W=1) 


x2 


VPCMOV 
Vo,Hi 
Vo,Hi 


VPROTD 
Vo,Wo,|b 


x3 


VPPERM 
10, Wo,Lo (W=0) 
lo,Lo, Wo (W=1) 


VPROTQ 
Vo,Wo,|b 


x4 


VPMACSSWW 
Vo,Ho,Wo,Lo 


VPMACSWW 
Vo,Ho,Wo,Lo 


VPMACSSWD 
Vo,Ho,Wo,Lo 


VPMACSWD 
Vo,Ho,Wo,Lo 


VPMACSSDQL 
Vo,Ho,Wo,Lo 


VPMACSDQL 
Vo,Ho,Wo,Lo 


VPMADCSSWD 
Vo,Ho,Wo,Lo 
VPMADCSWD 
Vo,Ho,Wo,Lo 


542 


AM D Py blic U S e] Opcode and Operand Encodings 


AMDi@1 


24594—Rev. 3.33—November 2021 AMD64 Technology 
Table A-26. XOP Opcode Map 8h, Low Nibble = [8h:Fh] 
XOP.pp| Opcode x8 x9 aN xB xC xD XE XF 
Aen |excores 
VPMACSSDD VPMACSSDQH 
00 8x Vo,Ho,Wo,Lo Vo,Ho,Wo,Lo 
VPMACSDD VPMACSDQH 
00 9x Vo,Ho,Wo,Lo Vo,Ho,Wo,Lo 
5 Ax-Bx 
VPCOMccB* VPCOMccW* VPCOMccD* VPCOMccQ* 
00 Cx Vo,Ho, Wo, |b Vo,Ho,Wo,|b Vo,Ho,Wo,|b Vo,Ho,Wo, |b 
00 Dx 
VPCOMccUB* VPCOMccUW”* VPCOMccUD* VPCOMccUQ* 
00 Ex Vo,Ho,Wo,|b Vo,Ho,Wo,Ib Vo,Ho,Wo,|b Vo,Ho,Wo,|b 
00 Fx 
| |Note 1 The condition codes are LT, LE, GT, GE, EQ, NEQ, FALSE, and TRUE. They are encoded via lb, using 00...07h. 


Table A-27. XOP Opcode Map 9h, Low Nibble = [0h:7h] 


XOP.pp 


Opcode 


Ox 


1x 


2x-7x 


8x 


Dx 


Ex 


Fx 


XOP group #1 


XOP group #2 


XOP group #3 


VFRCZPS 
Vx,Wx 


VPROTB 
Vo,Wo,Ho (W=0) 


Vo,Ho,Wo (W=1) 


VFRCZPD 
Vx,Wx 


VPROTW 
Vo,Wo,Ho (W=0) 
Vo,Ho,Wo (W=1) 


VPHADDBW 
Vo,Wo 


VPHADDUBWD 
Vo,Wo 


VPHSUBBW 
Vo,Wo 


VFRCZSS 
Vq,Wss 


VPROTD 
Vo,Wo,Ho (W=0) 
Vo,Ho,Wo (W=1) 


VPHADDBD 
Vo,Wo 


VPHADDUBD 
Vo,Wo 


VPHSUBWD 
Vo,Wo 


VFRCZSD 


VPROTQ 
Vo,Wo,Ho (W=0) 
Vo,Ho,Wo (W=1) 


VPHADDBQ 
Vo,Wo 


VPHADDUBQ 
Vo,Wo 


VPHSUBDQ 
Vo,Wo 


VPSHLB 
Vo,Wo,Ho (W=0) 
Vo,Ho,Wo (W=1) 


VPSHLW 
Vo,Wo,Ho (W=0) 
Vo,Ho,Wo (W=1) 


VPSHLD 
Vo,Wo,Ho (W=0) 
Vo,Ho,Wo (W=1) 


VPHADDWD 
Vo,Wo 


VPHADDUWD 


VPSHLQ 
Vo,Wo,Ho (W=0) 
Vo,Ho,Wo (W=1) 


VPHADDWQ 
Vo,Wo 


VPHADDUWQ 
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Table A-28. XOP Opcode Map 9h, Low Nibble = [8h:Fh] 


XOP.pp}| Opcode x8 x9 XA xB xC xD XE XF 
sel 


VPSHAB VPSHAW VPSHAD VPSHAQ 
Vo,Wo,Ho (W=0) Vo,Wo,Ho (W=0) Vo,Wo,Ho (W=0) Vo,Wo,Ho (W=0) 
Vo,Ho,Wo (W=1) Vo,Ho,Wo (W=1) Vo,Ho,Wo (W=1) Vo,Ho,Wo (W=1) 


VPHADDDQ 
Vo,Wo 


VPHADDUDQ 
Vo,Wo 


Een | group #4 
Gy,Ey,Id 


| xr | Fx 


Table A-30. XOP eee ian Ah, Low Nibble = siti, 


OP.pp}| Opcode 


Opcodes Reserved 


Table A-31. XKOP Opcode Groups 


ModRM.reg 


BLCFILL BLSFILL TIMSKC 
By, Ey By, Ey , , By, Ey 
BLCMSK 
By, Ey 


LWPINS LWPVAL 
By,Ed,lId By,Ed,ld 


LLWPCB SLWPCB 
Ry 
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A.2 Operand Encodings 


An operand is data that affects or is affected by the execution of an instruction. Operands may be 
located in registers, memory, or I/O ports. For some instructions, the location of one or more operands 
is implicitly specified based on the opcode alone. However, for most instructions, operands are 
specified using bytes that immediately follow the opcode byte. These bytes are designated the mode- 
register-memory (ModRM) byte, the scale-index-base (SIB) byte, the displacement byte(s), and the 
immediate byte(s). The presence of the SIB, displacement, and immediate bytes are optional 
depending on the instruction, and, for instructions that reference memory, the memory addressing 
mode. 


The following sections describe the encoding of the ModRM and SIB bytes in various processor 
modes. 


A.2.1 ModRM Operand References 


Figure A-2 below shows the format of the ModRM byte. There are three fields—mod, reg, and r/m. 
The reg field is normally used to specify a register-based operand. The mod and r/m fields together 
provide a 5-bit field, augmented in 64-bit mode by the R and B bits of a REX, VEX, or XOP prefix, 
normally used to specify the location of a second memory- or register-based operand and, for a 
memory-based operand, the addressing mode. 


As described in “Encoding Extensions Using the ModRM Byte” on page 515, certain instructions use 
either the reg field, the r/m field, or the entire ModRM byte to extend the opcode byte in the encoding 
of the instruction operation. 


7 6 5 4 3 2 1 0 


reg r/m ModRM 


REX.R, VEX.R or XOP.R Lit 
extend this field to 4 bits 


REX.B, VEX.B, or XOP.B 
extend this field to 4 bits 


Figure A-2. ModRM-Byte Format 


The two sections below describe the ModRM operand encodings, first for 16-bit references and then 
for 32-bit and 64-bit references. 


16-Bit Register and Memory References. Table A-32 shows the notation and encoding 
conventions for register references using the ModRM reg field. This table is comparable to Table A-34 
on page 548 but applies only when the address-size is 16-bit. Table A-33 on page 546 shows the 
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notation and encoding conventions for 16-bit memory references using the ModRM byte. This table is 
comparable to Table A-35 on page 549. 


Table A-32. ModRM reg Field Encoding, 16-Bit Addressing 


Mnemonic ModRM reg Field 
Notation 10 /1 I2 13 l4 15 I6 I7 
reg8 AL CL DL BL AH CH DH BH 
reg16 AX CX DX BX SP BP Sl DI 
reg32 EAX ECX EDX EBX ESP EBP ESI EDI 
mmx MMXO MMX1 MMX2 MMX3 MMxX4 MMX5 MMX6 MMX7 
xmm XMMO XMM1 | XMM2 | XMM3 | XMM4 | XMM5 | XMM6 XMM7 
ymm YMMO YMM1 YMM2 | YMM3 | YMM4 | YMM5 | YMM6 YMM7 
sReg ES cS SS DS FS GS invalid invalid 
cReg CRO CR1 CR2 CR3 CR4 CR5 CR6 CR7 
dReg DRO DR1 DR2 DR3 DR4 DR5 DR6 DR7 
Table A-33. ModRM Byte Encoding, 16-Bit Addressing 
ModRM ModRM reg Field*+ ModRM 
: mod r/m 
Effective Address Field | 0 | 12 | 12 | 13 | 14 | 15 | 16 | 17 | Gietg 
(binary) Complete ModRM Byte (hex) (binary) 
[BX] + [SI] 000 
[BX] + [Dl] 001 
[BP] + [SI] 010 
[BP] + [Dl] 011 
00 
[S!] 100 
[Dl] 101 
disp16 110 
[BX] 111 
Notes: 
1. See Table A-32 for complete specification of ModRM “reg” field. 
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Table A-33. ModRM Byte Encoding, 16-Bit Addressing (continued) 


ModRM ModRM reg Field+ ModRM 
Effective Address me fo] a /2,|e8)/4,/5)e6 7] po 
(binary) Complete ModRM Byte (hex) (binary) 
[BX] + [SI] + disp8 000 
[BX] + [Dl] + disp8 001 
[BP] + [SI] + disp8 010 
[BP] + [DI] + disp8 011 
[SI] + disp& = 100 
[Dl] + disp8 101 
[BP] + disp& 110 
[BX] + disp8 111 
[BX] + [SI] + disp16 000 
[BX] + [DI] + disp16 001 
[BP] + [SI] + disp16 010 
[BP] + [Dl] + disp16 011 
[SI] + disp16 = 100 
[DI] + disp16 101 
[BP] + disp16 110 
[BX] + disp16 111 
AL/ AX/ EAX/ MMX0/ XMMO/ YMMO 000 
CL/ CX/ ECX/ MMX1/ XMM1/ YMM1 001 
DL/ DX/ EDX/ MMX2/ XMM2/ YMM2 010 
BL/ BX/ EBX/ MMX3/ XMM3/ YMM3 011 
AH/ SP/ ESP/ MMX4/ XMM4/ YMM4 100 
CH/ BP/ EBP/ MMX5/ XMM5/ YMM5 101 
DH/ SI/ ESI/ MMX6/ XMM6/ YMM6 110 
BH/ DI/ EDI/ MMX7/ XMM7/ YMM7 111 
Notes: 
1. See Table A-32 for complete specification of ModRM “reg” field. 


Register and Memory References for 32-Bit and 64-Bit Addressing. Table A-34 on 

page 548 shows the encoding for register references using the ModRM reg field. The first ten rows of 
Table A-34 show references when the REX.R bit is cleared to 0, and the last ten rows show references 
when the REX.R bit is set to 1. In this table, entries under the Mnemonic Notation heading correspond 
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to register notation described in “Mnemonic Syntax” on page 53, and the /r notation under the ModRM 
reg Field heading corresponds to that described in “Opcode Syntax” on page 56. 


Table A-34. ModRM reg Field Encoding, 32-Bit and 64-Bit Addressing 


Mnemonic BER Bit ModRM reg Field 

Notation 10 1 2 13 I4 I5 I6 I7 
legs (st(‘<‘ wC*d”rt!.!.~C~™~C«*wCvSC*C ck OY CU YT CB CO AMSPL| CH/BPL] DHYSIL | BHIDIL | 
regi6 AX CX DX BX SP BP Sl DI 
reg32 EAX ECX EDX EBX ESP EBP ESI EDI 
reg64 RAX RCX RDX RBX RSP RBP RSI RDI 
mmx MMX0O MMX1 MMX2 MMX3 MMX4 MMX5 MMX6 MMX7 
xmm : XMMO XMM1 XMM2 XMM3 | XMM4 XMM5 XMM6 XMM7 
ymm YMMO YMM1 YMM2 YMM3 YMM4 YMM5 YMM6 YMM7 
sReg ES cs SS DS FS GS invalid | invalid 
cReg CRO CR1 CR2 CR3 CR4 CR5 CR6 CR7 
dReg DRO DR1 DR2 DR3 DR4 DR5 DR6 DR7 
legs Sti(<i‘(wtC*d”t!~!~!”.~~~~é«dtC SB | hOB | hRiOB | RB | R128 | R13B ] R14B | RISB | 
regi6 R8W ROW R10W R1i1Ww R12W R13W R14w Ri5Ww 
reg32 R8D R9D R10D R11D R12D R13D R14D R15D 
reg64 R8 R9 R10 R11 R12 R13 R14 R15 
mmx MMX0O MMX1 MMX2 MMX3 MMX4 MMX5 MMX6 MMX7 
xmm : XMM8 | XMM9 |} XMM10 | XMM11 | XMM12 | XMM13 | XMM14 | XMM15 
ymm YMM8 YMM9 | YMM10 | YMM11 | YMM12 | YMM13 | YMM14 | YMM15 
sReg ES cS SS DS FS GS invalid | invalid 
cReg CR8 CRI CR10 CR11 CR12 CR13 CR14 CR15 
dReg DR8 DR9 DR10 DR11 DR12 DR13 DR14 DR15 


Table A-35 on page 549 shows the encoding for 32-bit and 64-bit memory references using the 
ModRM byte. This table describes 32-bit and 64-bit addressing, with the REX.B bit set or cleared. The 
Effective Address is shown in the two left-most columns, followed by the binary encoding of the 
ModRM-byte mod field, followed by the eight possible hex values of the complete ModRM byte (one 
value for each binary encoding of the ModRM-byte reg field), followed by the binary encoding of the 
ModRM r/n field. 


The /O0 through /7 notation for the ModRM reg field (bits [5:3]) means that the three-bit field contains a 
value from zero (binary 000) to 7 (binary 111). 
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Table A-35. ModRM Byte Encoding, 32-Bit and 64-Bit Addressing 


Effective Address 
REX.B = 0 REX.B=1 
[rAX] [r8] 
[rCX] [r9] 
[rDX] [r10] 
[rBX] [r11] 
SIB? SIB? 


[rIP] + disp32 or 
disp32° 


[rIP] + disp32 or 
disp32° 


[rS]] [r14] 
[rDI] [r15] 
[rAX] + disp8& [r8] + disp8 
[rCX] + disp8 [r9] + disp8 


[rDX] + disp8 


[r10] + disp8 


[rBX] + disp8& 


[r11] + disp8& 


[SIB] + disp8 [SIB] + disp8 
[rBP] + disp8& [r13] + disp8 
[rSI] + disp8 [r14] + disp8 
[rDI] + disp8 [r15] + disp8& 


[rAX] + disp32 


[r8] + disp32 


[rCX] + disp32 


[r9] + disp32 


[rDX] + disp32 


[r10] + disp32 


[rBX] + disp32 


[r11] + disp32 


SIB + disp32 


SIB + disp32 


[rBP] + disp32 


[r13] + disp32 


[rS!] + disp32 


[r14] + disp32 


[rDI] + disp32 


[r15 ] + disp32 


ModRM 
mod 
Field 

(binary) 


00 


01 


10 


ModRM reg Field? 


10 


1) 22 | 3 | 4) 1 


I6 


I7 


Complete ModRM Byte (hex) 


ModRM 
r/m 
Field 
(binary) 


000 


001 


010 


011 


100 


101 


110 


111 
000 


001 


010 


011 


100 


101 


110 


111 
000 


001 


010 


011 


100 


101 


110 


1. See Table A-34 for complete specification of ModRM “reg” field. 


2. If SIB.base = 5, the SIB byte is followed by four-byte disp32 field and addressing mode is absolute. 


3. In 64-bit mode, the effective address is [rlP]+disp32. In all other modes, the effective address is disp32. If the 
address-size prefix is used in 64-bit mode to override 64-bit addressing, the [RIP]+disp32 effective address is trun- 


cated after computation to 32 bits. 


111 
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Table A-35. ModRM Byte Encoding, 32-Bit and 64-Bit Addressing (continued) 


; ModRM ModRM reg Field* ModRM 
Effective Address mod r/lm 
Field | 0 | 2 | 2) 13 | 4 | 5 | 6 | IT | cielg 
REX.B =0 REX.B=1 (binary) Complete ModRM Byte (hex) (binary) 
AL/tAX/MMX0/XMMO/ |r8/MMX0/XMM8/ ane 
YMMO YMM8 
CL/tCX/MMX1/XMM1/|r9/MMX1/XMM9/ en 
YMM1 YMMg9 
DL/rDX/MMX2/XMM2/|r10/MMX2/XMM10/ aie 
YMM2 YMM10 
BL/rBX/MMX3/XMM3/ |r11/MMX3/XMM11/ aa 
YMM3 YMM11 
11 

AH/SPL/rSP/MMX4/__ |r12/MMX4/XMM12/ ar 
XMM4/YMM4 YMM12 
CH/BPL/rBP/MMX5/_/r13/MMX5/XMM13/ - 
XMM5/YMM5 YMM13 
DH/SIL/rSI/MMX6/ ——|r14/MMX6/XMM14/ ‘ag 
XMM6/YMM6 YMM14 
BH/DIL/rDI/MMX7/_—_|r15/MMX7/XMM15/ jaa 
XMM7/YMM7 YMM15 
Notes: 


1. See Table A-34 for complete specification of ModRM “reg” field. 
2. If SIB.base = 5, the SIB byte is followed by four-byte disp32 field and addressing mode is absolute. 


3. In 64-bit mode, the effective address is [rlP]+disp32. In all other modes, the effective address Is disp32. If the 
address-size prefix is used in 64-bit mode to override 64-bit addressing, the [RIP]+disp32 effective address is trun- 
cated after computation to 32 bits. 


A.2.2 SIB Operand References 


Figure A-3 on page 551 shows the format of a scale-index-base (SIB) byte. Some instructions have an 
SIB byte following their ModRM byte to define memory addressing for the complex-addressing 
modes described in “Effective Addresses” in Volume 1. The SIB byte has three fields—scale, index, 
and base—that define the scale factor, index-register number, and base-register number for 32-bit and 
64-bit complex addressing modes. In 64-bit mode, the REX.B and REX.X bits extend the encoding of 
the SIB byte’s base and index fields. 
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Bits: 7 6 5 4 3 2 1 0 


SB 


REX.X bit of REX prefix can il 
extend this field to 4 bits 


REX.B bit of REX prefix can 
extend this field to 4 bits 


Figure A-3. SIB Byte Format 


Table A-36 shows the encodings for the SIB byte’s base field, which specifies the base register for 
addressing. Table A-37 on page 552 shows the encodings for the effective address referenced by a 
complete SIB byte, including its scale and index fields. The /0 through /7 notation for the SIB base 
field means that the three-bit field contains a value between zero (binary 000) and 7 (binary 111). 


Table A-36. Addressing Modes: SIB base Field Encoding 


REX.B Bit ModRM mod Field eee 
10 I1 12 I3 14 5 16 I7 

00 disp32 

0 01 [rAX] | [rCxX] | [rDX] | [rBX] | [rSP] [rBP] + disp8 [rS!] [rD1] 
10 [rBP] + disp32 
00 disp32 

1 01 [r8] [r9] [r10] [r11] [r12] [r13] + disp8 [r1.4] [r15] 
10 [r13] + disp32 
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Table A-37. Addressing Modes: SIB Byte Encoding 


SIB base Field? 


1. See Table A-36 on page 551 for complete specification of SIB base field. 


REX.B = 0 | rAX | rCX | rDX | rBX | rSP ot rSI_| rDl 
Effective Address SIB | SIB 
scale | index note 
Field | Field |REX-B=1/ r8 | r9 | r10/ri1j}r12; , | r14| ris 
Oo; 11 | 22 | 13 | 4 | | 6 | I7 
REX.X = 0 REX.X =1 Complete SIB Byte (hex) 

[rAX] + [base] [r8] + [base] 000 00 | 01 | 02 | 03 | 04 | 05 | 06 | 07 
[rCX] + [base] [r9] + [base] 001 08 | 09 | OA | OB |} OC |} OD | OE | OF 
[rDX] + [base] [r10] + [base] 010 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 
[rBX] + [base] [r11] + [base] 011 18 | 19 | 1A | 1B | 1C | 1D |} 1E |} 1F 
[base] [r12] + [base] " 100 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 
[rBP] + [base] [r13] + [base] 101 28 | 29 | 2A | 2B | 2C | 2D | 2E | 2F 
[rS!] + [base] [r14] + [base] 110 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 
[rDI] + [base] [r15] + [base] 111 38 | 39 | 3A | 3B |] 3C |} 3D | 3E | 3F 
[rAX] * 2 + [base] |[r8] * 2 + [base] 000 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 
[rCX] * 2 + [base] |[r9] * 2 + [base] 001 48 | 49 | 4A | 4B | 4C | 4D | 4E | 4F 
[rDX] * 2 + [base] |[r10] * 2 + [base] 010 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 
[rBX] * 2 + [base] |[r11] * 2 + [base] 011 58 | 59 | 5A | 5B | 5C |} 5D | 5E | 5F 
[base] [r12] * 2 + [base] " 100 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 
[rBP] * 2 + [base] |[r13] * 2 + [base] 101 68 | 69 | 6A | 6B | 6C |} 6D | GE | 6F 
[rSI] * 2 + [base] |[r14] * 2 + [base] 110 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 
[rDI] * 2 + [base] |[r15] * 2 + [base] 111 78 | 79 | 7A | 7B | 7C | 7D | 7E | 7F 
[rAX] * 4 + [base] |[r8] * 4 + [base] 000 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 
[rCX] * 4 + [base] |[r9] * 4 + [base] 001 88 | 89 | 8A | 8B | 8C | 8D | 8E | 8F 
[rDX] * 4 + [base] |[r10] * 4 + [base] 010 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 
[rBX] * 4 + [base] |[r11] * 4 + [base] O11 98 | 99 | 9A | 9B | 9C | 9D | VE | OF 
[base] [r12] * 4 + [base] = 100 AO | Al | A2 | A3 | A4 | AS | AG | A7 
[rBP]*4+[base] | [r13] * 4 + [base] 101 A8 | AQ | AA | AB | AC | AD | AE | AF 
[rS!]*4+[base] [r14] * 4 + [base] 110 BO | B1 | B2 | B3 | B4 | B5 | B6 | B7 
[rD1]*4+[base] [r15] * 4 + [base] 111 B8 | B9 | BA | BB | BC | BD | BE | BF 

Notes: 
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Table A-37. Addressing Modes: SIB Byte Encoding (continued) 


AMD64 Technology 


SIB base Field? 
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REX.B = 0 | rAX | rCX | rDX | rBX | rSP te rSI_| rDl 
Effective Address SIB | SIB 
scale | index note 
Field | Field |REX-B=1/ r8 | r9 | r10/ri1j}r12; , | r14 | ris 
10 | 11 | 12 | 13 | 4 | & | 6 | I 
REX.X = 0 REX.X =1 Complete SIB Byte (hex) 
[rAX] * 8 + [base] |[r8] * 8 + [base] 000 CO | C1 | C2 | C3 | C4 |] C5 | C6 | C7 
[rCX] * 8 + [base] |[r9] * 8 + [base] 001 C8 | C9 | CA | CB | CC | CD | CE | CF 
[rDX] * 8 + [base] |[r10] * 8 + [base] 010 DO | D1 | D2 | D3 | D4 | D5 |] D6 | D7 
[rBX] * 8 + [base] |[r11] * 8 + [base] 011 D8 | D9 | DA | DB | DC | DD | DE |} DF 
[base] [r12] * 8 + [base] = 100 EO | £1 | E2 | £3 | £4 | E5 | E6 | E7 
[rBP] * 8 + [base] |[r13] * 8 + [base] 101 E8 | E9 | EA |) EB |} EC | ED | EE | EF 
[rSI] * 8 + [base] |[r14] * 8 + [base] 110 FO | Fl | F2 | F3 | F4 | F5 | F6 | F7 
[rDI] * 8 + [base] |[r15] * 8 + [base] 111 F8 | F9 | FA | FB | FC | FD | FE | FF 
Notes: 
1. See Table A-36 on page 551 for complete specification of SIB base field. 
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Appendix B_ General-Purpose Instructions in 
64-Bit Mode 


This appendix provides details of the general-purpose instructions in 64-bit mode and its differences 
from legacy and compatibility modes. The appendix covers only the general-purpose instructions 
(those described in Chapter 3, “General-Purpose Instruction Reference”). It does not cover the 128- 
bit media, 64-bit media, or x87 floating-point instructions because those instructions are not affected 
by 64-bit mode, other than in the access by such instructions to extended GPR and XMM registers 
when using a REX prefix. 


B.1 General Rules for 64-Bit Mode 


In 64-bit mode, the following general rules apply to instructions and their operands: 


¢ “Promoted to 64 Bit”: If an instruction’s operand size (16-bit or 32-bit) in legacy and 
compatibility modes depends on the CS.D bit and the operand-size override prefix, then the 
operand-size choices in 64-bit mode are extended from 16-bit and 32-bit to include 64 bits (with a 
REX prefix), or the operand size is fixed at 64 bits. Such instructions are said to be “Promoted to 
64 bits” in Table B-1. However, byte-operand opcodes of such instructions are not promoted. 


¢ Byte-Operand Opcodes Not Promoted: As stated above in “Promoted to 64 Bit”, byte-operand 
opcodes of promoted instructions are not promoted. Those opcodes continue to operate only on 
bytes. 


e Fixed Operand Size: If an instruction’s operand size is fixed in legacy mode (thus, independent of 
CS.D and prefix overrides), that operand size is usually fixed at the same size in 64-bit mode. For 
example, CPUID operates on 32-bit operands, irrespective of attempts to override the operand 
size. 


¢ Default Operand Size: The default operand size for most instructions is 32 bits, and a REX prefix 
must be used to change the operand size to 64 bits. However, two groups of instructions default to 
64-bit operand size and do not need a REX prefix: (1) near branches and (2) all instructions, except 
far branches, that implicitly reference the RSP. See Table B-5 on page 583 for a list of all 
instructions that default to 64-bit operand size. 


¢ Zero-Extension of 32-Bit Results: Operations on 32-bit operands in 64-bit mode zero-extend the 
high 32 bits of 64-bit GPR destination registers. 

¢ No Extension of 8-Bit and 16-Bit Results: Operations on 8-bit and 16-bit operands in 64-bit 
mode leave the high 56 or 48 bits, respectively, of 64-bit GPR destination registers unchanged. 

e Shift and Rotate Counts: When the operand size is 64 bits, shifts and rotates use one additional 
bit (6 bits total) to specify shift-count or rotate-count, allowing 64-bit shifts and rotates. 


¢ Immediates: The maximum size of immediate operands is 32 bits, except that 64-bit immediates 
can be MOVed into 64-bit GPRs. Immediates that are less than 64 bits are a maximum of 32 bits, 
and are sign-extended to 64 bits during use. 
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¢ Displacements and Offsets: The maximum size of an address displacement or offset is 32 bits, 
except that 64-bit offsets can be used by specific MOV opcodes that read or write AL or rAX. 
Displacements and offsets that are less than 64 bits are a maximum of 32 bits, and are sign- 
extended to 64 bits during use. 


¢ Undefined High 32 Bits After Mode Change: The processor does not preserve the upper 32 bits 


of 


the 64-bit GPRs across switches from 64-bit mode to compatibility or legacy modes. In 


compatibility or legacy mode, the upper 32 bits of the GPRs are undefined and not accessible to 
software. 


B.2 


Operation and Operand Size in 64-Bit Mode 


Table B-1 lists the integer instructions, showing operand size in 64-bit mode and the state of the high 


32 bits 


of destination registers when 32-bit operands are used. Opcodes, such as byte-operand versions 


of several instructions, that do not appear in Table B-1 are covered by the general rules described in 
“General Rules for 64-Bit Mode” on page 555. 


Table 


B-1. Operations and Operands in 64-Bit Mode 


Instruction and Type of Detault For 32-Bit For 64-Bit 


‘i ae Operand 2g aie 
Opcode (hex) Operation Size? Operand Size Operand Size 


AAA - 


ASCII Adjust after Addition 


es INVALID IN 64-BIT MODE (invalid-opcode exception) 


AAD - 


ASCII Adjust AX before Division 


- INVALID IN 64-BIT MODE (invalid-opcode exception) 


AAM - 


ASCII Adjust AX after Multiply 


aa INVALID IN 64-BIT MODE (invalid-opcode exception) 


AAS - 


ASCII Adjust AL after Subtraction 


= INVALID IN 64-BIT MODE (invalid-opcode exception) 


Notes: 
1. 
2. 


3. 


See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 

The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 

Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 

If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 

to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 

Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 

ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 

rites of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
its. 

Any pointer registers (rDI, rSI) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 

size, any pointer and count registers are zero-extended to 64 bits. 

The Picea sil size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 

in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 
Instruction and Type of Laseae For 32-Bit For 64-Bit 
Opcode (hex) Operation? ea Operand Size* | Operand Size* 
ADC—Add with Carry 
11 
13 Promoted to ee 
6A bits 32 bits bit register 
15 : results to 64 bits. 
81 /2 
83 /2 
ADD—Signed or Unsigned Add 
01 
03 Promoted to , ee 
6A bits 32 bits bit register 
05 , results to 64 bits. 
81 /0 
83 /0 
AND—Logical AND 
21 
23 Promoted to ee 
6A bits 32 bits bit register 
25 : results to 64 bits. 
81/4 
83 /4 
ARPL - Adjust Requestor Privilege Level , 
ee OPCODE USED as MOVSXD in 64-BIT MODE 
BOUND - Check Array Against Bounds ; : : 
a INVALID IN 64-BIT MODE (invalid-opcode exception) 
BSF—Bit Scan Forward Zero-extends 32- 
Promoted to : 
6A bits 32 bits bit register 
OF BC : results to 64 bits. 


Notes: 


1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 


The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 

If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 

Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
bits. 

Any pointer registers (rDI, rSI) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 

The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 
Instruction and Type of Lae For 32-Bit For 64-Bit 
Opcode (hex)* Operation? ead Operand Size* | Operand Size* 
BSR—Bit Scan Reverse Zero-extends 32- 


Etemoedte 32 bits bit register 


OF BD 64 bits. results to 64 bits. 
BSWAP—Byte Swap Promoted to . Zero-extends ne Swap all 8 bytes 
64 bits SM ce of a 64-bit GPR 
OF C8 through OF CF results to 64 bits. 
BT—Bit Test 
Promoted to . . 
OF A3 6A bits. 32 bits No GPR register results. 
OF BA/4 


BTC—Bit Test and Complement 


Srometedia Zero-extends 32- 


OF BB 6A bits 32 bits bit register 
OF BA/7 results to 64 bits. 


BTR—Bit Test and Reset 


Beomated te Zero-extends 32- 


OF B3 6A bits 32 bits bit register 
OF BA/6 results to 64 bits. 


BTS—Bit Test and Set 


Biemoeata Zero-extends 32- 


OF AB 6A bits. 32 bits bit register . 
OF BA/5 results to 64 bits. 
CALL—Procedure Call Near See “Near Branches in 64-Bit Mode” in Volume 1. 


RIP = RIP + 32- 


Promoted to bit displacement 


BY 64 bits. G4 Bits Can't encode.° sign-extended to 
64 bits. 
RIP = 64-bit 
p d offset from 
FF /2 tes ” 64 bits Can't encode.® __| register or 
’ memory. 


See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 

The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 

If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 

Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
bits. 

Any pointer registers (rDI, rS!) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 

The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 
Instruction and Type of Lae For 32-Bit For 64-Bit 
Opcode (hex)? Operation? ead Operand Size* | Operand Size* 
CALL—Procedure Call Far See “Branches to 64-Bit Offsets” in Volume 1. 
9A INVALID IN 64-BIT MODE (invalid-opcode exception) 
If selector points to a gate, then 
Hramoated id RIP = 64-bit offset from gate, else 
FF /3 32 bits RIP = zero-extended 32-bit offset 
64 bits. ; ; 
from far pointer referenced in 
instruction. 
CBW, CWDE, CDQE—Convert Byte to CDQE (new 
Word, Convert Word to Doubleword, CWDE: Converts | mnemonic): 
32 bits 
Convert Doubleword to Quadword p oe ; _ | word to Converts 
ae eels ae of desti- | goubleword. doubleword to 
a " = eo Zero-extends quadword. 
EAX to RAX. RAX = sign- 
extended EAX. 
CDQ see CWD, CDQ, CQO 
CDQE (new mnemonic) see CBW, CWDE, CDQE 
CDWE see CBW, CWDE, CDQE 
CLC—Clear Carry Fla 
ies SAmnie as Not relevant. | No GPR register results. 
F8 legacy mode. 
CLD—Clear Direction Fla 
? Sane Not relevant. | No GPR register results. 
FC legacy mode. 
CLFLUSH—Cache Line Invalidate 
eames Not relevant. | No GPR register results. 
OF AE /7 legacy mode. 
CLGI—Clear Global Interrupt 
: rs eatien Not relevant | No GPR register results. 
OF 01 DD legacy mode 
CLI—Clear Interrupt Fla 
7 sae Not relevant. | No GPR register results. 
FA legacy mode. 
Notes: 
1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 
2. The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 
3. If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 
4. Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
bits. 
5. Any pointer registers (rDI, rSI) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 
6. The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 


Instruction and Type of Laue For 32-Bit For 64-Bit 
Opcode (hex)? Operation? eas Operand Size* | Operand Size* 
CLTS—Clear Task-Switched Flag in ie 
CRO ae e Not relevant. | No GPR register results. 
legacy mode. 
OF 06 
CMC—Complement Carry Fla 
: : 2amieas Not relevant. | No GPR register results. 
F5 legacy mode. 
CMOVcc—Conditional Move Zero-extends 32- 
bit register 
Promoted to ; results to 64 bits. 
é 32 bits : 
OF 40 through OF 4F 64 bits. This occurs even 
if the condition is 
false. 
CMP—Compare 
39 
3B Promoted to Zero-extends 32- 
6A bits 32 bits bit register 
3D : results to 64 bits. 
81 /7 
83 /7 
CMPS, CMPSW, CMPSD, CMPSQ— ; CMPSQ (new 
; CMPSD: Ne 
Compare Strings Conpare Sin mnemonic): 
Promoted to 32 bits p g Compare String 
64 bits. Doublewords. 
AT ae: Quadwords 
i See footnote? 
CMPXCHG—Compare and Exchange Zero-extends 32- 
Promoted to , é 
6A bits 32 bits bit register 
OF B1 , results to 64 bits. 


Notes: 


1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 


2. The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 

3. If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 

4. Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
bits. 

5. Any pointer registers (rDI, rSI) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 

6. The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 


Instruction and Type of Laseae For 32-Bit For 64-Bit 
Opcode (hex) Operation? ea Operand Size* | Operand Size* 
CMPXCHG8B—Compare and CMPXCHG16B 
Segue tee Same as Zope sone: aescen 
32 bits. EDX and EAX to , 
legacy mode. 6A bits pare and 
OF C7 /1 : Exchange 16 
Bytes. 
CPUID—Processor Identification atnade Operand SIZE | 5 or9-extends 32-bit register results 
fixed at 32 é' 
OF A2 legacy mode. bits to 64 bits. 


CQO (new mnemonic) 


see CWD, CDQ, CQO 


CWD, CDQ, CQO—Convert Word to ; CQO (new 
Doubleword, Convert Doubleword to raed etanie mnemonic): 
Quadword, Convert Quadword to Double niceona Converts 
Quadword 32 bits eB , quadword to 
Promoted to | (size of desti- cpa double 
64 bits. nation regis- is quadword. 
ter) Zele exten Sign-extends 
ae SA |RAX to RDX. 
RAX is 
unchanged. unchanged. 


DAA - Decimal Adjust AL after Addition 
27 


INVALID IN 64-BIT MODE (invalid-opcode exception) 


DAS - Decimal Adjust AL after 
Subtraction 


2F 


INVALID IN 64-BIT MODE (invalid-opcode exception) 


1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 


2. The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 

3. If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 

4. Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
bits. 

5. Any pointer registers (rDI, rS!) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 

6. The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 


Instruction and Type of Lae For 32-Bit For 64-Bit 
Opcode (hex) Operation? ead Operand Size* | Operand Size* 
DEC—Decrement by 1 Promoted to ; ree eas 
EF /1 6A bits 32 bits bit register 
, results to 64 bits. 
48 through 4F OPCODE USED as REX PREFIX in 64-BIT MODE 


DIV—Unsigned Divide 


Promoted to 


32 bits 


Zero-extends 32- 
bit register 


RDX:RAX 
contain a 64-bit 
quotient (RAX) 


F716 ea its: results to 64 bits, | 274 64-bit 
"| remainder 
(RDX). 
ENTER—Create Procedure Stack . wv 
romoted to ; 

Frame eee 64 bits | Can’t encode® 

C8 
HLT—Halt 

Sane as Not relevant. | No GPR register results. 
FA legacy mode. 


IDIV—Signed Divide 


Promoted to 


Zero-extends 32- 


32 bits bit register ; 
F7I7 ec results to 64 bits. ane Sag 
remainder 

(RDX). 


RDX:RAX 
contain a 64-bit 
quotient (RAX) 


Notes: 


1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 


The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 

If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 

Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
bits. 

. Any pointer registers (rDI, rSl) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 

The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 
Instruction and Type of Laseae For 32-Bit For 64-Bit 
Opcode (hex) Operation? ea Operand Size* | Operand Size* 
IMUL - Signed Multiply RDX:RAX = RAX 
* reg/mem64 
F7/5 (i.e., 128-bit 
result) 
reg64 = reg64 * 
OF AF é i 
Promoted to Zere-extends 32°. eqimemG4 
6A bits 32 bits bit register — 
; results to 64 bits, | "e954 = 
69 reg/mem64 * 
imm32 
reg64 = 
6B reg/mem64 * 
imms 
IN—Input From Port 
E5 Same as 32 bits Zero-extends 32-bit register results 
legacy mode. to 64 bits. 
ED 
INC—Increment by 1 Zero-extends 32- 
Promoted to . 
6A bits 32 bits bit register 
FF /0 , results to 64 bits. 
AO through 47 OPCODE USED as REX PREFIX in 64-BIT MODE 
INS, INSW, INSD—Input String INSD: Input String Doublewords. 
Same as 32 bits No GPR register results. 
6D legacy mode. 
See footnote® 
INT n—lInterrupt to Vector 
CD Promoted to See “Long-Mode Interrupt Control 
. Not relevant. on 
INT3—Interrupt to Debug Vector 64 bits. Transfers” in Volume 2. 
CC 
INTO - Interrupt to Overflow Vector ; ; : 
ee INVALID IN 64-BIT MODE (invalid-opcode exception) 


1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 


The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 

If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 

Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
bits. 

. Any pointer registers (rDI, rSl) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 

The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 


Instruction and Type of Lae For 32-Bit For 64-Bit 
Opcode (hex)? Operation? eas Operand Size* | Operand Size* 
INVD—Invalidate Internal Caches 
SAmneiae Not relevant. | No GPR register results. 
OF 08 legacy mode. 
INVLPG—Invalidate TLB Entr 
: pieniied e Not relevant. | No GPR register results. 
OF 01 /7 64 bits. 
INVLPGA—Invalidate TLB Entry ina Same as 
Specified ASID legacy mode. Not relevant. | No GPR register results. 


IRET, IRETD, IRETQ—Interrupt Return 


CF 


Promoted to 
64 bits. 


32 bits 


IRETD: Interrupt 
Return 
Doubleword. 


See “Long-Mode 
Interrupt Control 
Transfers” in 
Volume 2. 


IRETQ (new 
mnemonic): 
Interrupt Return 
Quadword. 

See “Long-Mode 
Interrupt Control 
Transfers” in 
Volume 2. 


Jcc—Jump Conditional 


70 through 7F 


OF 80 through OF 8F 


See “Near Branches in 64-Bit Mode” in Volume 1. 


Promoted to 
64 bits. 


64 bits 


Can't encode.® 


RIP = RIP + 8-bit 
displacement 
sign-extended to 
64 bits. 


RIP = RIP + 32- 
bit displacement 
sign-extended to 
64 bits. 


JCXZ, JECXZ, JRCXZ—Jump on 
CX/ECX/RCX Zero 


E3 


Promoted to 
64 bits. 


64 bits 


Can't encode.® 


RIP = RIP + 8-bit 
displacement 
sign-extended to 
64 bits. 


See footnote? 


Notes: 


1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 


The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 


If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 

Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 


Any pointer registers (rDI, rSI) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 


The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 


2. 
3. 
4. 
bits. 
5. 
6. 
in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 


Instruction and 
Opcode (hex)? 


Type of 
Operation? 


Default 
Operand 


Size? 


For 32-Bit 
Operand Size* 


For 64-Bit 
Operand Size* 


JMP—Jump Near 


EB 


See “Near Branches in 64-Bit Mode” in Volume 1. 


RIP = RIP + 8-bit 
displacement 


sign-extended to 
64 bits. 


RIP = RIP + 32- 
bit displacement 
sign-extended to 
64 bits. 


RIP = 64-bit 
offset from 
register or 
memory. 


Promoted to 


ee 64 bits. 


64 bits Can't encode.® 


FF /4 


JMP—Jump Far 


See “Branches to 64-Bit Offsets” in Volume 1. 


EA INVALID IN 64-BIT MODE (invalid-opcode exception) 
If selector points to a gate, then 
Piemimeaia RIP = 64-bit offset from gate, else 
FF /5 : 32 bits RIP = zero-extended 32-bit offset 
64 bits. : ; 
from far pointer referenced in 
instruction. 
LAHF - Load Status Flags into AH . 
Register ame as 9" | Not relevant. 
acy mode. 
OF 
LAR—Load Access Rights Byte Zero-extends 32- 
Same as . ; i 
fenaeeauede 32 bits bit register 
OF 02 —s , results to 64 bits. 


LDS - 


Load DS Far Pointer 


e INVALID IN 64-BIT MODE (invalid-opcode exception) 


Notes: 
1. 
2. 


3. 


See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 

The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 

Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 

If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 

to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 

Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 

ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 

pee of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
its. 

Any pointer registers (rDI, rS!) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 

size, any pointer and count registers are zero-extended to 64 bits. 

The Daten a size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 

in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 


Instruction and Type of Lae For 32-Bit For 64-Bit 
Opcode (hex) Operation? ead Operand Size* | Operand Size* 
LEA—Load Effective Address Zero-extends 32- 
Promoted to ; 
6A bits 32 bits bit register 
8D , results to 64 bits. 
LEAVE—Delete Procedure Stack Frame | Promoted to oe é 
C9 6A bits. Its Can’t encode 
LES - Load ES Far Pointer F : ; 
eu INVALID IN 64-BIT MODE (invalid-opcode exception) 
LFENCE—Load Fence 
Salles Not relevant. | No GPR register results. 
OF AE /5 legacy mode. 
LFS—Load FS Far Pointer Same as ; Zero-extends 32-bit register results 
32 bits 
OF B4 legacy mode. to 64 bits. 


LGDT—Load Global Descriptor Table 


Promoted to 


Operand size 


No GPR register results. 


Register fixed at 64 
ae 64 bits. ee Loads 8-byte base and 2-byte limit. 
LGS—Load GS Far Pointer Same as : Zero-extends 32-bit register results 
32 bits ; 
OF BS legacy mode. to 64 bits. 


LIDT—Load Interrupt Descriptor Table 


Promoted to 


Operand size 


No GPR register results. 


Register fixed at 64 7 
64 bits. ee Loads 8-byte base and 2-byte limit. 
OF 01/3 bits. 
LLDT—Load Local Descriptor Table cammenaes Operand size | No GPR register results. 
Register fixed at16 | References 16-byte descriptor to 
64 bits. 
OF 00 /2 bits. load 64-bit base. 
LMSW—Load Machine Status Word Bae Operand size 
fixed at16 |No GPR register results. 
OF 01 /6 legacy mode. Bits. 


Notes: 


1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 


The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 

If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 

Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
bits. 

Any pointer registers (rDI, rS!) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 

The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 
Instruction and Type of Laseae For 32-Bit For 64-Bit 
Opcode (hex) Operation? ea Operand Size* | Operand Size* 
LODS, LODSW, LODSD, LODSQ— LODSD: Load 
Load String String LODSQ (new 
Doublewords. mnemonic): Load 
Promoted to 32 bits | Zero-extends 32- | String 
AD 64 bits. bit register Quadwords. 
results to 64 bits. See footnote? 
See footnote? 
LOOP—Loop 
E2 RIP = RIP + 8-bit 
LOOPZ, LOOPE—Loop if Zero/Equal 5 F displacement 
E1 Aoi to 64 bits |Can't encode.® _ | Sign-extended to 
' 64 bits. 
LOOPNZ, LOOPNE—Loop if Not - 
Zero/Equal See footnote 
EO 
LSL—Load Segment Limit Same as 32 bits | Ze"O-extends 32-bit register results 
OF 03 legacy mode. to 64 bits. 
LSS —Load SS Segment Register Same as 32 bits _ | ZeTO-extends 32-bit register results 
OF B2 legacy mode. to 64 bits. 
LTR—Load Task Register Pascaneeie Operand size | No GPR register results. 
here 6A bits fixed at16 | References 16-byte descriptor to 
; bits. load 64-bit base. 
LZCNT—Count Leading Zeros Promoted to 32 bits Zero-extends 32-bit register results 
F3 OF BD 64 bits. to 64 bits. 
MFENCE—Memory Fence 
pe ALG 2 ae Not relevant. | No GPR register results. 
Operand size 
Pane ee Monitor addiess nee fixed at 32 No GPR register results. 
"| bits. 


Notes: 


1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 


2. The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 


3. 


Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 


If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 

Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
bits. 

Any pointer registers (rDI, rSI) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 

The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
in 64-bit mode. 


General-Purpose Instructions in TAMBS Pu blic Use] 


567 


AMDd¢1 


AMD64 Technology 


24594—Rev. 3.33—November 2021 


Table B-1. Operations and Operands in 64-Bit Mode (continued) 


Instruction and Type of Laseae For 32-Bit For 64-Bit 
Opcode (hex)* Operation? ea Operand Size* | Operand Size* 
MOV—Move 
89 
8B Zero-extends 32- 
bitregister = [35-bit immediate 
C7 results to 64 bits. |, sign-extended 
to 64 bits. 
B8 through BF ye ie 32 bits 64-bit immediate. 
A1 (moffset) , Zero-extends 32- 
bit register Memory offsets 
results to 64 bits. | are address- 
Memory offsets sized and default 
A3 (moffset) are address- to 64 bits. 
sized and default 
to 64 bits. 
MOV—Move to/from Segment Registers ; Zero-extends 32-bit register results 
32 bits 
8C to 64 bits. 
Same as 
legacy mode. | Operand size 
8E fixed at 16 No GPR register results. 
bits. 


MOV(CRn)—Move to/from Control 
Registers 


OF 22 
OF 20 


Promoted to 
64 bits. 


Operand size 
fixed at 64 
bits. 


The high 32 bits of control registers 
differ in their writability and reserved 
status. See “System Resources” in 

Volume 2 for details. 


MOV(DRn)—Move to/from Debug 
Registers 


OF 21 
OF 23 


Promoted to 
64 bits. 


Operand size 
fixed at 64 
bits. 


The high 32 bits of debug registers 
differ in their writability and reserved 
status. See “Debug and 
Performance Resources” in 

Volume 2 for details. 


Notes: 


bits. 


in 64-bit mode. 


1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 

2. The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 

3. If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 


4. Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 


5. Any pointer registers (rDI, rSI) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 


6. The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 
Instruction and Type of Laseae For 32-Bit For 64-Bit 
Opcode (hex)? Operation? ea Operand Size* | Operand Size* 
MOVD—Move Doubleword or 
Quadword Zero-extends 32- 
bit register 
oe 5 Tr results to 64 bits. 
OF 7E comeen | “aotitg 
66 OF 6E ; Zero-extends 32- | Zero-extends 64- 
bit register bit register 
66 OF 7E results to 128 results to 128 
bits. bits. 
MOVNTI—Move Non-Temporal . ae 
Doubleword seus 32 bits No GPR register results. 
64 bits. 
OF C3 
MOVS, MOVSW, MOVSD, MOVSQ— MOVSD: Move MOVSQ Mews 
Move String . aa String mnemonic): 
romoted to : ; 
64 bits. 32 bits | Doublewords. Move sting 
A5 2 Quadwords. 
See footnote See fooinate® 
MOVSX—Move with Sign-Extend 
Sign-extends 
OF BE Zero-extends 32- | byte to 
Promoted to ; 
6A bits. 32 bits bit register quadword. 
results to 64 bits. Sign-extends 
OF BF word to 
quadword. 
Notes: 
1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 
The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 
If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 
Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
bits. 
. Any pointer registers (rDI, rSI) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 
The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 


Instruction and Type of Laseae For 32-Bit For 64-Bit 
Opcode (hex)? Operation? ea Operand Size* | Operand Size* 
MOVSXD—Move with Sign-Extend New 
Doubleword instruction, 
available only 
in 64-bit Zero-extends 32- | Sign-extends 
mode. (In 32 bits bit register doubleword to 
63 other modes, results to 64 bits. | quadword. 
this opcode 
is ARPL 
instruction.) 
MOVZX—Move with Zero-Extend 
Zero-extends 
OF B6 Zero-extends 32- | byte to 
Promoted to ee bit register quadword. 
64 bits. Its | results to 64 bits. Far extends 
OF B7 word to 
quadword. 
2] i j : = * 
MUL—Multiply Unsigned Jaig-edends 29: RDX:RAX RAX 
Promoted to ; quadword in 
6A bits 32 bits bit register eaieaa 
F7/4 , results to 64 bits. |'©9 
memory. 
Operand size 
MWAIT—Monitor Wait 
rales fixed at 32 No GPR register results. 
OF 01 C9 legacy mode. bits 
NEG—Negate Two’s Complement Zero-extends 32- 
Promoted to 
6A bits 32 bits bit register 
F713 ; results to 64 bits. 
NOP—No Operation 
: baal lbica Not relevant. | No GPR register results. 
90 legacy mode. 
NOT—Negate One’s Complement Zero-extends 32- 
Promoted to ; 
6A bits 32 bits bit register 
F7 [2 ; results to 64 bits. 


Notes: 


1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 


The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 

If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 

Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 


Any pointer registers (rDI, rSI) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 


The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 


2. 
3: 
4. 
bits. 
5. 
6. 
in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 
Instruction and Type of Laseae For 32-Bit For 64-Bit 
Opcode (hex)* Operation? ea Operand Size* | Operand Size* 
OR—Logical OR 
09 
Zero-extends 32- 
OB 
Ae " 32 bits bit register 
0D : results to 64 bits. 
81/1 
83 /1 
OUT—Ouitput to Port 
E7 Sallie as 32 bits No GPR register results. 
legacy mode. 
EF 
OUTS, OUTSW, OUTSD—Output String Writes doubleword to I/O port. 
Same as 32 bits No GPR register results. 
6F legacy mode. : 
See footnote 
PAUSE—Pause 
Salle es Not relevant. | No GPR register results. 
F3 90 legacy mode. 
POP—Pop Stack 
Promoted to ; No GPR register 
6 
eey0 64 bits. Of its Cannot encode results. 
58 through 5F 
POP—Pop (segment register from) 
Stack 
Same as : No GPR register 
6 
OF Ai (POP FS) legacy mode. on nis Cannot encode results. 
OF A9 (POP GS) 
1F (POP DS) 
07 (POP ES) INVALID IN 64-BIT MODE (invalid-opcode exception) 
17 (POP SS) 
Notes: 
1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 
2. The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 
3. If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 
4. Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
bits. 
5. Any pointer registers (rDI, rSI) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 
6. The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 


Instruction and 
Opcode (hex)? 


Type of 
Operation? 


Default 
Operand 


Size? 


For 32-Bit 
Operand Size* 


For 64-Bit 
Operand Size* 


POPA, POPAD—Pop All to GPR Words 
or Doublewords 


INVALID IN 64-BIT MODE (invalid-opcode exception) 


61 
POPCNT—Bit Population Count Promoted to 32 bits Zero-extends 32-bit register results 
F3 OF B8 64 bits. to 64 bits. 
POPF, POPFD, POPFQ—Pop to POPFQ (new 
rFLAGS Word, Doublword, or Quadword mnemonic): Pops 
64 bits off stack, 
Promoted to 64 bits g |Writes low 32 bits 
64 bits. Cannot encode” | into EFLAGS and 
9D zero-extends the 
high 32 bits of 
RFLAGS. 
PREFETCH—Prefetch L1 Data-Cache re 
Line es Not relevant. | No GPR register results. 
OF OD /0 
PREFETCHleve/—Prefetch Data to . 
Cache Level level eer Not relevant. | No GPR register results. 
OF 18 /0-3 
PREFETCHW—Prefetch L1 Data-Cache S 
Line for Write eaerode Not relevant. | No GPR register results. 
OF OD /1 
PUSH—Push onto Stack 
FF /6 
Promoted to : 
50 through 57 6A bits 64 bits Cannot encode® 
6A 
68 


Notes: 


1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 


2. The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 

3. If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 

4. Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
bits. 

5. Any pointer registers (rDI, rS!) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 

6. The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 


Instruction and Type of Laseae For 32-Bit For 64-Bit 
Opcode (hex) Operation? ea Operand Size* | Operand Size* 
PUSH—Push (segment register) onto 
Stack 
Promoted to ; 
6 
OF AO (PUSH FS) 64 bits. 64 bits | Cannot encode 
OF A8 (PUSH GS) 


OE (PUSH CS) 
1E (PUSH DS) 
06 (PUSH ES) 
16 (PUSH SS) 
PUSHA, PUSHAD - Push All to GPR 


INVALID IN 64-BIT MODE (invalid-opcode exception) 


Words or Doublewords INVALID IN 64-BIT MODE (invalid-opcode exception) 
60 
PUSHF, PUSHFD, PUSHFQ—Push PUSHFQ (new 
rFLAGS Word, Doubleword, or Sroimeted ia mnemonic): 
Quadword onto Stack 6A bits 64 bits Cannot encode® | Pushes the 64-bit 
, RFLAGS 
9C register. 


RCL—Rotate Through Carry Left 
Zero-extends 32- 


D1 /2 
eye = 32 bits bit register Uses 6-bit count. 
D3 /2 : results to 64 bits. 
C1 /2 
RCR—Rotate Through Carry Right 
D1 /3 Promoted to Zero-extends 32- . 
6A bits 32 bits bit register Uses 6-bit count. 
D3 /3 ; results to 64 bits. 
C1 /3 
RDMSR—Read Model-Specific Register RDX[31:0] contains MSR[63:32], 
Same as Nigcrelevani RAX[31:0] contains MSR[31:0]. 
OF 32 legacy mode. * |Zero-extends 32-bit register results 
to 64 bits. 


Notes: 

1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 

2. The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 

3. If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 

4. Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
bits. 

5. Any pointer registers (rDI, rSI) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 

6. The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 


Instruction and Type of Lace For 32-Bit For 64-Bit 
Opcode (hex)? Operation? ea Operand Size* | Operand Size* 
RDPMC—Read Performance- RDX[31:0] contains PMC[63:32], 
Monitoring Counters Same as RAX[31:0] contains PMC[31:0]. 
Not relevant. 
legacy mode. Zero-extends 32-bit register results 
OF 33 to 64 bits. 
RDTSC—Read Time-Stamp Counter RDX[31:0] contains TSC[63:32], 
Same as Ric reievan RAX[31:0] contains TSC[31:0]. 
OF 31 legacy mode. | Zero-extends 32-bit register results 
to 64 bits. 
RDTSCP—Read Time-Stamp Counter RDX[31:0] contains TSC[63:32], 
and Processor ID RAX[31:0] contains TSC[31:0]. 
Same as Rieerslewani RCX[31:0] contains the TSC_AUX 
legacy mode. * |MSR CO000_0103h[31:0]. Zero- 
OF 01 F9 extends 32-bit register results to 64 
bits. 
REP INS—Repeat Input String Same as Si Reads doubleword I/O port. 
its 
F3 6D legacy mode. See footnote® 
REP LODS—Repeat Load String m ae Zero-extends 
tee Ae EAX to 64 bits. 5 
TT 6A bits. 32 bits : See footnote 
See footnote 
REP MOVS—Repeat Move String Promoted to No GPR register results. 
32 bits 5 
F3 A5 64 bits. See footnote 
REP OUTS—Repeat Output String to Writes doubleword to I/O port. 
Port Same as 
or 32 bits No GPR register results. 
legacy mode. 
F3 6F See footnote® 
REP STOS—Repeat Siore String Promoted to No GPR register results. 
32 bits 5 
F3 AB 64 bits. See footnote 
REPx CMPS —Repeat Compare String | Promoted to No GPR register results. 
32 bits 
F3 A7 64 bits. See footnote® 


Notes: 


1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 

2. The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 

3. If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 

4. Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 


bits. 


5. Any pointer registers (rDI, rSI) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 


6. The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 


in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 
Instruction and Type of Lae For 32-Bit For 64-Bit 
Opcode (hex)? Operation? ead Operand Size* | Operand Size* 
REPx SCAS —Repeat Scan Siring Promoted to No GPR register results. 
32 bits 
F3 AF 64 bits. See footnote® 
RET—Return from Call Near See “Near Branches in 64-Bit Mode” in Volume 1. 
C2 Promoted to , No GPR register 
6 
C3 64 bits. eats Cannot encode. results. 
RET—Return from Call Far Eeoaieielte See “Control Transfers” in Volume 1 
CB 32 bits and “Control-Transfer Privilege 
64 bits. ni 
Checks” in Volume 2. 
CA 
ROL—Rotate Left 
D1 /0 Promoted to eee eee 
6A bits 32 bits bit register . Uses 6-bit count. 
D3 /0 ; results to 64 bits. 
C1 /0 
ROR—Rotate Right 
D1/1 Promoted to . coe ae 
6A bits 32 bits bit register Uses 6-bit count. 
D3 /1 : results to 64 bits. 
C1 /1 
RSM—Resume from System New SMM Sate a Moder’ 
Management Mode state-save Noreievan. | ne een Mode 
Volume 2. 
OF AA area. 
SAHF—Siore AH into Flags 3 
- Sallie a> [eg Not relevant. |No GPR register results. 
OF acy mode. 
SAL—Shift Arithmetic Left 
D1 /4 Promoted to , Zelorexends se . 
6A bits 32 bits bit register Uses 6-bit count. 
D3 /4 , results to 64 bits. 
C1 /4 


Notes: 


bits. 


in 64-bit mode. 


1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 

2. The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 

3. If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 

4. Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 


5. Any pointer registers (rDI, rSI) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 


6. The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 


Instruction and Type of Laseae For 32-Bit For 64-Bit 
Opcode (hex)? Operation? ea Operand Size* | Operand Size* 
SAR—Shift Arithmetic Right 
D1 /7 Promoted to Zero-extends 32- . 
6A bits 32 bits bit register Uses 6-bit count. 
D3 /7 ; results to 64 bits. 
C1/7 
SBB—Subtract with Borrow 
19 
1B Promoted to Zoi eniaee: 
6A bits 32 bits bit register 
1D ; results to 64 bits. 
81/3 
83 /3 
SCAS, SCASW, SCASD, SCASQ— SCASD: Scan 
Scan String String SCASQ (new 
Doublewords. mnemonic): Scan 
ea 10 | 32 bits Zero-extends 32- | String 
AF ia bit register Quadwords. 
results to 64 bits. | sae fogtnote® 
See footnote® 
SFENCE—Store Fence 
Sane as Not relevant. | No GPR register results. 
OF AE /7 legacy mode. 
SGDT—Store Global Descriptor Table r 
Register P Promoted to ede — No GPR register results. 
vera. 64 bits. Bite: Stores 8-byte base and 2-byte limit. 
SHL—Shift Left 
bD1/4 Promoted to Zero-extends 32- 
6A bits 32 bits bit register Uses 6-bit count. 
D3 /4 ; results to 64 bits. 
C1 /4 
Notes: 


1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 


2. The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 

3. If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 

4. Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
bits. 

5. Any pointer registers (rDI, rSI) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 

6. The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 
Instruction and Type of Laseae For 32-Bit For 64-Bit 
Opcode (hex)? Operation? a Operand Size* | Operand Size* 
SHLD—Shift Left Double Zero-extends 32- 
Promoted to , 
OF A4 6A bits 32 bits bit register Uses 6-bit count. 
OF AS : results to 64 bits. 
SHR—Shift Right 
Zero-extends 32- 
D1/5 
ees Bp 32 bits bit register Uses 6-bit count. 
D3 /S ; results to 64 bits. 
C1/5 
SHRD—Shift Right Double Zero-extends 32- 
Promoted to ; 
OF AC 6A bits 32 bits bit register Uses 6-bit count. 
OF AD ; results to 64 bits. 


SIDT—Store Interrupt Descriptor Table 


Promoted to 


Operand size 


No GPR register results. 


Register , fixed at 64 oa 
aera 64 bits. bine: Stores 8-byte base and 2-byte limit. 
SKINIT—Secure Init and Jump with Zero-extends 32- 
: Same as : : 
Attestation ieacey winde Not relevant | bit register 
OF 01 DE Beet results to 64 bits. 
SLDT—Store Local Descriptor Table 
Register Same as 32 Zero-extends 2-byte LDT selector to 
legacy mode. 64 bits. 
OF 00 /O 
SMSW-—Store Machine Status Word Zero-extends 32- Stores 64-bit 
Same as : ; ; 
legacy mode 32 bit register machine status 
OF 01 /4 , results to 64 bits. word (CRO). 
STC—Set Carry Fla 
te session Not relevant. | No GPR register results. 
F9 legacy mode. 
STD—Set Direction Fla 
7 Sane: Not relevant. | No GPR register results. 
FD legacy mode. 


Notes: 


1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 


2. The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 


3. If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 

4. Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
bits. 

5. Any pointer registers (rDI, rSI) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 

6. The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 
Instruction and Type of Late For 32-Bit For 64-Bit 
Opcode (hex) Operation? eal Operand Size* | Operand Size* 
STGI—Set Global Interrupt Flag Same as Not relevant. . 
No GPR register results. 
OF 01 DC legacy mode. 
STI - Set Interrupt Fla 
. - Sane as Not relevant. | No GPR register results. 
FB legacy mode. 
STOS, STOSW, STOSD, STOSQ- Store STOSD: Store STOSQ pew 
String . 7 String mnemonic): 
romoted to 
64 bits. 32 bits Doublewords. oie Sang 
AB See eee Quadwords. 
eee See footnote® 
STR—Store Task Register Same as cc Zero-extends 2-byte TR selector to 
OF 00 /1 legacy mode. 64 bits. 
SUB—Subtract 
29 
2B Promoted to . coe eens 
6A bits 32 bits bit register 
2D , results to 64 bits. 
81/5 
83 /5 
SWAPGS—Swap GS Register with New 
KernelGSbase MSR instruction, 
available only 
in 64-bit See “SWAPGS Instruction” in 
Not relevant. 
mode. (In Volume 2. 
OF 01 /7 other modes, 
this opcode 
is invalid.) 
SYSCALL—Fast System Call Promoted to See “SYSCALL and SYSRET 
: Not relevant. Goes: : 
OF 05 64 bits. Instructions” in Volume 2 for details. 
Notes: 
1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 


2. The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 

3. If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 

4. Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
bits. 

5. Any pointer registers (rDI, rSI) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 

6. The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 
Instruction and Type of Lae For 32-Bit For 64-Bit 
Opcode (hex) Operation? ead Operand Size* | Operand Size* 
SYSENTER—System Call : ; ; 
INVALID INLONG MODE (invalid-opcode exception) 
OF 34 
SYSEXIT—System Return 2 : ; 
pecs INVALID INLONG MODE (invalid-opcode exception) 
SYSRET—Fast System Return Promoted to : See “SYSCALL and SYSRET 
; 32 bits Soe : 
OF 07 64 bits. Instructions” in Volume 2 for details. 
TEST—Test Bits 
85 Promoted to , 
e 6A bits. 32 bits No GPR register results. 
F7 /0 
UD2—Undefined Operation 
‘ eae es Not relevant. | No GPR register results. 
OF OB legacy mode. 
VERR—Verify Segment for Reads Sapteewe Operand size 
fixed at16 |No GPR register results. 
OF 00 /4 legacy mode. bits 
VERW—Verify Segment for Writes Bana ae Operand size 
fixed at16 |No GPR register results. 
OF 00 /5 legacy mode. bits 
VMLOAD—Load State from VMCB 
Sane as Not relevant. | No GPR register results. 
OF 01 DA legacy mode. 
VMMCALL—Call VMM 
eames Not relevant. | No GPR register results. 
OF 01 DY legacy mode. 
VMRUN—Run Virtual Machine 
pubic Not relevant. | No GPR register results. 
OF 01 D8 legacy mode. 
VMSAVE—Save State to VMCB 
SAine as Not relevant. | No GPR register results. 
OF 01 DB legacy mode. 
Notes: 
1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 
2. The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 
3. If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 
4. Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
bits. 
5. Any pointer registers (rDI, rSI) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 
6. The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
in 64-bit mode. 
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Table B-1. Operations and Operands in 64-Bit Mode (continued) 
Instruction and Type of Lae For 32-Bit For 64-Bit 
Opcode (hex) Operation? eas Operand Size* | Operand Size* 
WAIT—Wait for Interrupt 
P SAIne as Not relevant. | No GPR register results. 
OB legacy mode. 
WBINVD—Writeback and Invalidate All = 
Caches ilies Not relevant. | No GPR register results. 
legacy mode. 
OF 09 
WRMSR—Write to Model-Specific S No GPR register results. 
Register ite ee Not relevant. | MSR[63:32] = RDX[31:0] 
OF 30 ae MSR[31:0] = RAX[31:0] 
XADD—Exchange and Add Zero-extends 32- 
Promoted to ; 
32 bits bit register 
OF C1 64 bits. 


results to 64 bits. 


XCHG—Exchange Register/Memory 


with Register Fieeniated ie Zero-extends 32- 
87 6A bits 32 bits bit register 
; results to 64 bits. 
90 
XOR—Logical Exclusive OR 
31 
33 Zero-extends 32- 
eas 32 bits bit register 
35 ; results to 64 bits. 
81 /6 
83 /6 
Notes: 
1. See “General Rules for 64-Bit Mode” on page 555, for opcodes that do not appear in this table. 
2. The type of operation, excluding considerations of operand size or extension of results. See “General Rules for 64- 
Bit Mode” on page 555 for definitions of “Promoted to 64 bits” and related topics. 
3. If “Type of Operation” is 64 bits, a REX prefix is needed for 64-bit operand size, unless the instruction size defaults 
to 64 bits. If the operand size Is fixed, operand-size overrides are silently ignored. 
4. Special actions in 64-bit mode, in addition to legacy-mode actions. Zero or sign extensions apply only to result oper- 
ands, not source operands. Unless otherwise stated, 8-bit and 16-bit results leave the high 56 or 48 bits, respec- 
tively, of 64-bit destination registers unchanged. Immediates and branch displacements are sign-extended to 64 
bits. 
5. Any pointer registers (rDI, rSI) or count registers (rCX) are address-sized and default to 64 bits. For 32-bit address 
size, any pointer and count registers are zero-extended to 64 bits. 
6. The default operand size can be overridden to 16 bits with 66h prefix, but there is no 32-bit operand-size override 
in 64-bit mode. 
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B.3 Invalid and Reassigned Instructions in 64-Bit Mode 


Table B-2 lists instructions that are illegal in 64-bit mode. Attempted use of these instructions 
generates an invalid-opcode exception (#UD). 


Table B-2. Invalid Instructions in 64-Bit Mode 


Mnemonic ee Description 
AAA 37 ASCII Adjust After Addition 
AAD D5 ASCII Adjust Before Division 
AAM D4 ASCII Adjust After Multiply 
AAS 3F ASCII Adjust After Subtraction 
BOUND 62 Check Array Bounds 
CALL (far) 9A Procedure Call Far (far absolute) 
DAA 27 Decimal Adjust after Addition 
DAS 2F Decimal Adjust after Subtraction 
INTO CE Interrupt to Overflow Vector 
JMP (far) EA Jump Far (absolute) 
LDS C5 Load DS Far Pointer 
LES C4 Load ES Far Pointer 
POP DS 1F Pop Stack into DS Segment 
POP ES 07 Pop Stack into ES Segment 
POP SS 17 Pop Stack into SS Segment 
POPA, POPAD 61 Pop All to GPR Words or Doublewords 
PUSH CS OE Push CS Segment Selector onto Stack 
PUSH DS 1E Push DS Segment Selector onto Stack 
PUSH ES 06 Push ES Segment Selector onto Stack 
PUSH SS 16 Push SS Segment Selector onto Stack 
eth 60 Push All to GPR Words or Doublewords 
Redundant Grol 82 /2 Scoala encoding of group1 Eb,Ib 
SALC D6 Set AL According to CF 
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Table B-3 lists instructions that are reassigned to different functions in 64-bit mode. Attempted use of 
these instructions generates the reassigned function. 


Table B-3. Reassigned Instructions in 64-Bit Mode 


Opcode 


Mnemonic (hex) 


Description 


Opcode for MOVSXD instruction in 64-bit 
ARPL 63 mode. In all other modes, this is the Adjust 
Requestor Privilege Level instruction opcode. 


REX prefixes in 64-bit mode. In all other 


DEC andNG ae modes, decrement by 1 and increment by 1. 
VEX Prefix. Introduces the VEX two-byte 
LDS C5 : 
instruction encoding escape sequence. 
LES CA VEX Prefix. Introduces the VEX three-byte 


instruction encoding escape sequence. 


Table B-4 lists instructions that are illegal in long mode. Attempted use of these instructions generates 
an invalid-opcode exception (#UD). 


Table B-4. Invalid Instructions in Long Mode 


Mnemonic a Description 
SYSENTER OF 34 System Call 
SYSEXIT OF 35 System Return 
B.4 Instructions with 64-Bit Default Operand Size 


In 64-bit mode, two groups of instructions default to 64-bit operand size without the need fora REX 

prefix: 

¢ Near branches —CALL, Jcc, JrCX, JMP, LOOP, and RET. 

¢ All instructions, except far branches, that implicitly reference the RsP—CALL, ENTER, LEAVE, 
POP, PUSH, and RET (CALL and RET are in both groups of instructions). 


Table B-5 lists these instructions. 
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Table B-5. Instructions Defaulting to 64-Bit Operand Size 


Mnemonic ee ea Description 
RSP 

CALL E8, FF /2 yes Call Procedure Near 
ENTER C8 yes Create Procedure Stack Frame 
Jec many no Jump Conditional Near 
JMP E9, EB, FF /4 no Jump Near 
LEAVE C9 yes Delete Procedure Stack Frame 
LOOP E2 no Loop 
LOOPcc EO, E1 no Loop Conditional 
POP reg/mem 8F /O yes Pop Stack (register or memory) 
POP reg 58-5F yes Pop Stack (register) 
POP FS OF Al yes Pop Stack into FS Segment Register 
POP GS OF AQ yes Pop Stack into GS Segment Register 
POPF, POPFD, POPFQ 9D yes Pop to rFLAGS Word, Doubleword, or Quadword 
PUSH imms 6A yes Push onto Stack (sign-extended byte) 
PUSH imm32 68 yes Push onto Stack (sign-extended doubleword) 
PUSH reg/mem FF /6 yes Push onto Stack (register or memory) 
PUSH reg 50-57 yes Push onto Stack (register) 
PUSH FS OF AO yes Push FS Segment Register onto Stack 
PUSH GS OF A8 yes Push GS Segment Register onto Stack 
PUSHF, PUSHFD, 9C yes Push rFLAGS Word, Doubleword, or Quadword 
PUSHFQ onto Stack 
RET C2, C3 yes Return From Call (near) 


The 64-bit default operand size can be overridden to 16 bits using the 66h operand-size override. 
However, it is not possible to override the operand size to 32 bits because there is no 32-bit operand- 
size override prefix for 64-bit mode. See “Operand-Size Override Prefix” on page 7 for details. 


B.5 Single-Byte INC and DEC Instructions in 64-Bit Mode 


In 64-bit mode, the legacy encodings for the 16 single-byte INC and DEC instructions (one for each of 
the eight GPRs) are used to encode the REX prefix values, as described in “REX Prefix” on page 14. 
Therefore, these single-byte opcodes for INC and DEC are not available in 64-bit mode, although they 
are available in legacy and compatibility modes. The functionality of these INC and DEC instructions 
is still available in 64-bit mode, however, using the ModRM forms of those instructions (opcodes FF/0 
and FF/1). 
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B.6 NOP in 64-Bit Mode 


Programs written for the legacy x86 architecture commonly use opcode 90h (the XCHG EAX, EAX 
instruction) as a one-byte NOP. In 64-bit mode, the processor treats opcode 90h specially in order to 
preserve this legacy NOP use. Without special handling in 64-bit mode, the instruction would not be a 
true no-operation. Therefore, in 64-bit mode the processor treats XCHG EAX, EAX as a true NOP, 
regardless of operand size. 


This special handling does not apply to the two-byte ModRM form of the XCHG instruction. Unless a 
64-bit operand size is specified using a REX prefix byte, using the two byte form of XCHG to 
exchange a register with itself will not result in a no-operation because the default operation size is 32 
bits in 64-bit mode. 


B.7 Segment Override Prefixes in 64-Bit Mode 


In 64-bit mode, the CS, DS, ES, SS segment-override prefixes have no effect. These four prefixes are 
no longer treated as segment-override prefixes in the context of multiple-prefix rules. Instead, they are 
treated as null prefixes. 


The FS and GS segment-override prefixes are treated as true segment-override prefixes in 64-bit 
mode. Use of the FS and GS prefixes cause their respective segment bases to be added to the effective 
address calculation. See “FS and GS Registers in 64-Bit Mode” in Volume 2 for details. 


584 AM D p U bffe"tisep"P°° Instructions in 64-Bit Mode 


AMDd1 


24594—Rev. 3.33—November 2021 


AMD64 Technology 


Appendix C Differences Between Long Mode and 
Legacy Mode 


Table C-1 summarizes the major differences between 64-bit mode and legacy protected mode. The 
third column indicates differences between 64-bit mode and legacy mode. The fourth column indicates 
whether that difference also applies to compatibility mode. 


Table C-1. Differences Between Long Mode and Legacy Mode 


Type 


Subject 


64-Bit Mode Difference 


Applies To 
Compatibility 


Mode? 


Application 
Programming 


Addressing 


RIP-relative addressing available 


Data and Address 
Sizes 


Default data size is 32 bits 


REX Prefix toggles data size to 64 bits 


Default address size is 64 bits 


Address size prefix toggles address size to 32 bits 


Instruction 
Differences 


Various opcodes are invalid or changed in 64-bit 
mode (see Table B-2 on page 581 and Table B-3 on 
page 582) 


no 


Various opcodes are invalid in long mode (see 
Table B-4 on page 582) 


yes 


MOV reg,imm32 becomes MOV reg,imm64 (with 
REX operand size prefix) 


REX is always enabled 


Direct-offset forms of MOV to or from accumulator 
become 64-bit offsets 


MOVD extended to MOV 64 bits between MMX 
registers and long GPRs (with REX operand-size 
prefix) 


no 
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Table C-1. Differences Between Long Mode and Legacy Mode (continued) 


Pseudo-descriptors 


LLDT and LTR use expanded 16-byte table entries. 


Applies To 
Type Subject 64-Bit Mode Difference Compatibility 
Mode? 
x86 Modes Real and virtual-8086 modes not supported yes 
Task Switching Task switching not supported yes 
64-bit virtual addresses 
Addressing 4-level paging structures yes 
PAE must always be enabled 
CS, DS, ES, SS segment bases are ignored 
Segmentation CS, DS, ES, FS, GS, SS segment limits are ignored no 
CS, DS, ES, SS Segment prefixes are ignored 
All pushes are 8 bytes 
16-bit interrupt and trap gates are illegal 
Exception and 32-bit interrupt and trap gates are redefined as 64-bit 
System ; Interrupt Handling | gates and are expanded to 16 bytes — 
Programming SS is set to null on stack switch 
SS:RSP is pushed unconditionally 
All pushes are 8 bytes 
16-bit call gates are illegal 
Call Gates 32-bit call gate type is redefined as 64-bit call gate yes 
and is expanded to 16 bytes. 
SS is set to null on stack switch 
System-Descriptor |GDT, IDT, LDT, TR base registers expanded to 64 
Registers bits ae? 
System-Descriptor |LGDT and LIDT use expanded 10-byte pseudo- 
Table Entries and _| descriptors. no 
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Appendix D Instruction Subsets and CPUID 
Feature Flags 


This appendix provides information that can be used to determine if a specific instruction within the 
AMD64 instruction-set architecture (ISA) is supported on a processor. 


Originally the x86 ISA was composed of a set of instructions from the general-purpose and system 
instruction groups. This set forms the base of the AMD64 ISA. As the ISA expanded over time, new 
instructions were added. Each addition constituted either a single instruction or a set of instructions 
and each addition was assigned a specific processor feature flag. 


Although most current processor products support the entire ISA, support for each added instruction or 
instruction subset is optional and must be confirmed by testing the corresponding feature flag. The 
presence of a particular instruction or subset is indicated by the corresponding feature flag being set. A 
feature flag is a single bit value located at a specific bit position within the 32-bit value returned in a 
register as a result of executing the CPUID instruction. 


For more information on using the CPUID instruction, see the instruction reference page for CPUID 
on page 165. For a comprehensive list of processor feature flags accessed using the CPUID 
instruction, see Appendix E, “Obtaining Processor Information Via the CPUID Instruction” on 

page 593. 
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D.1 Instruction Set Overview 


The AMD64 ISA can be organized into five instruction groups: 


1. General-purpose instructions 


These instructions operate on the general-purpose registers (GP registers) and can be used at all 
privilege levels. This group includes instructions to load and store the contents of a GP register to 
and from memory, move values between the GP registers, and perform arithmetic and logical 
operations on the contents of the registers. 


2. System instructions 


These instructions provide the means to manipulate the processor operating mode, access 
processor resources, handle program and system errors, and manage system memory. Many of 
these instructions require privilege level 0 to execute. 


3. x87 instructions 


These instructions are available at all privilege levels and include legacy floating-point 
instructions that use the ST(0)-ST(7) stack registers (FPRO-FPR7 physical registers) and 
internally use extended precision (80-bit) binary floating-point representation and operations. 


4. 64-bit media Instructions 


These instructions are available at all privilege levels and perform vector operations on packed 
integer and floating-point values held in the 64-bit MMX™ registers. The MMX register set 
overlays the FPRO—FPR7 physical registers. This group is composed of the MMX and 3DNow!™ 
instruction subsets and was subsequently expanded by the MMX and 3DNov! extensions subsets. 


5. SSE instructions 


The SSE instructions operate on packed integer and floating-point values held in the XMM /YMM 
registers. SSE includes the original Streaming SIMD Extensions, all the subsequent named SSE 
subsets, and the AVX, XOP, and AES instructions. 


Figure D-1 on page 589 represents the relationship between the five major instruction groups and the 
named instruction subsets. Circles represent the instruction subsets. These include the base instruction 
set labeled “Base Instructions” in the diagram and the named subsets. The diagram omits individual 
optional instructions and some of the minor named instruction subsets. Dashed-line polygons 
represent the instruction groups. 


Note that the 128-bit and 256-bit media instructions are referred to collectively as the Streaming SIMD 
Extensions (SSE). This is also the name of the original SSE subset. In the diagram the original SSE 
subset is labeled “SSE1 Instructions.” Collectively the 64-bit media and the SSE instructions make up 
the single instruction / multiple data (SIMD) group (labeled “SIMD Instructions” in the diagram). 


The overlapping of the SSE and 64-bit media instruction subsets indicates that these subsets share 
some common mnemonics. However, these common mnemonics either have distinct opcodes for each 
subset or they take operands in both the MMX and XMM register sets. 


The horizontal axis of Figure D-1 shows how the subsets have evolved over time. 
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D.2 CPUID Feature Flags Related to Instruction Support 


Only a subset of the CPUID feature flags provides information related to instruction support. 


The feature flags related to supported instruction subsets are accessed via the standard function 
number 0000_0001h, the extended function number 8000_0001h, and the structured extended 
function number 0000_0007h. 


The following table lists all flags related to instruction support. Entries for each flag provide the 
instruction or instruction subset corresponding to the flag, the CPUID function that must be executed 
to access the flag, and the bit position of the flag in the return value. 


Table D-1. Feature Flags for Instruction / Instruction Subset Support 


Feature Flag Instruction or Subset) CPUID Function* Feature Flag Bit Position? 
3DNow 3DNow! extended EDX[31] 
3DNowExt 3DNow! Extensions extended EDX[30] 
3DNowPrefetch Aicaalee extended ECX[8], EDX[29], or EDX[31] 
ABM LZCNT extended ECX[5] 
ADX ADCX, ADOX 0000_0007_0 EBX[19] 
AES AES standard ECX[25] 
AVX AVX standard ECX[28] 
AVX2 AVX2 0000_0007_0 EBX[5] 
BASE Base Instruction set — — 
BMI1 Bit Manipulation, group 1 |O000_0007_0 EBX[3] 
BMI2 Bit Manipulation, group 2 |O000_0007_0 EBX[8] 


Shadow Stack, 
CLRSSBSY, INCSSP, 
RDSSP, RSTORSSP, 


CET_SS SAVEPREVSSP, 0000_0007_0 ECX[7] 
SETSSBSY, WRSS, 
WRUSS 
CLFLOPT CLFLUSHOPT 0000_0007_0 EBX[23] 
CLFSH CLFLUSH, CLWB standard EDX[19] 
CLWB CLWB 0000_0007_0 EBX[24] 
CLZERO CLZERO 8000_0008 EBX[0] 
CMPXCHG8B CMPXCHG8B both EDX[8] 
CMPXCHG16B CMPXCHG16B standard ECX[13] 
CMOV CMOVcc both EDX[15] 
Notes: 


1. standard = Fn0000_0001h; extended = Fn 8000_0001h; both means that both standard and extended CPUID 
functions return the same feature flag in the same bit position of the return value. For functions of the form xxxx- 
_XXxx_X, the trailing digit is the value required in ECX. 


Register and bit position of the return value that corresponds to the feature flag. 
FCMOVcc instruction is supported if x87 and CMOVcc instructions are both supported. 
4. XSAVE (and related) instructions require separate enablement. 
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Table D-1. Feature Flags for Instruction / Instruction Subset Support (continued) 


Feature Flag Instruction or Subset} CPUID Function? Feature Flag Bit Position? 
F16C een standard ECX[29] 
FMA FMA standard ECX[12] 
FMA4 FMA4 extended ECX[16] 
FPU x87 both EDX[0] 
FSGSBASE - bate base read | 9Q00_0007_0 EBX(0] 
FXSR FXSAVE / FXRSTOR both EDX[24] 
INVLPGB INVLPGB, TLBSYNC 8000_0008 EBX[3] 
INVPCID INVPCID 0000_0007_0 EBX[10] 
LahfSahf LAHF / SAHF extended ECX[0] 
LM Long Mode extended EDX[29] 
MCOMMIT MCOMMIT 8000_0008 EBX[8] 
MMX MMX both EDX[23] 
MmxExt MMX Extensions extended EDX[22] 
MONITOR MONITOR / MWAIT standard ECX[3] 
MONITORX MONITORX / MWAITX _ |extended ECX[29] 
MOVBE MOVBE standard ECX[22] 
MSR RDMSR / WRMSR both EDX[5] 
OSPKE RDPKRU, WRPKRU 0000_0007_0 ECX[4] 
PCLMULQDQ PCLMULQDQ standard ECX[1] 
POPCNT POPCNT standard ECX[23] 
RDPID RDPID 0000_0007_0 ECX[22] 
RDPRU RDPRU 8000_0008 EBX[4] 
RDRAND RDRAND standard ECX[30] 
RDTSCP RDTSCP extended EDX[27] 
RDSEED RDSEED 0000_0007_0 EBX[18] 
SevEs VMGEXIT 8000_001F EAX[3] 
SHA SHA 0000_0007_0 EBX[29] 
SKINIT SKINIT / STGI extended ECX[12] 
SMAP CLAC, STAC 0000_0007_0 EBX[20] 

PSMASH, PVALIDATE, 
SNP RMPADJUST, 8000_001F EAX[4] 
RMPUPDATE 
SSE SSE1 standard EDX[25] 
SSE2 SSE2 standard EDX[26] 
Notes: 

1. standard = Fn0000_ 0001h; extended = Fn 8000_0001h; both means that both standard and extended CPUID 
functions return the same feature flag in the same bit position of the return value. For functions of the form xxxx- 
_XXxx_X, the trailing digit is the value required in ECX. 

2. Register and bit position of the return value that corresponds to the feature flag. 

3. FCMOVcc instruction is supported if x87 and CMOVcc instructions are both supported. 

4. XSAVE (and related) instructions require separate enablement. 
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Table D-1. Feature Flags for Instruction / Instruction Subset Support (continued) 


Feature Flag Instruction or Subset} CPUID Function Feature Flag Bit Position? 
SSE3 SSE3 standard ECX[0] 
SSSE3 SSSE3 standard ECX[9] 
SSE4A SSE4A extended ECX[6] 
SSE41 SSE4.1 standard ECX[19] 
SSE42 SSE4.2 standard ECX[20] 
SVM Secure Virtual Machine |extended ECX[2] 
SysCallSysRet SYSCALL / SYSRET extended EDX[(11] 
SysEnterSysExit SYSENTER/ SYSEXIT _ |standard EDX[11] 
TBM Trailing bit manipulation |extended ECX[21] 
TSC RDTSC both EDX[4] 
VAES VAES 256-bit instructions |O000_0007_0 ECX[9] 
VPCMULQDQ Moet, faeon 0000_0007_0 ECX[10] 
WBNOINVD WBNOINVD 8000_0008 EBX[9] 
x87 && CMOV FCMOVcc? both EDX[0] && EDX[15] 
XGETBV w/ ECX=1 XGETBV w/ ECX=1 0000_000D_1 EAX[2] 
XOP XOP extended ECX[11] 
XSAVE XSAVE / XRSTOR* standard ECX[26] 
XSAVEC XSAVEC 0000_000D_1 EAX[1] 
XSAVEOPT XSAVEOPT 0000_000D_1 EAX[0] 
XSAVES/XRSTORS XSAVES / XRSTORS 0000_000D_1 EAX[3] 
Notes: 

1. standard = Fn0000_ 0001h; extended = Fn 8000_0001h; both means that both standard and extended CPUID 
functions return the same feature flag in the same bit position of the return value. For functions of the form xxxx- 
_XXxx_X, the trailing digit is the value required in ECX. 

2. Register and bit position of the return value that corresponds to the feature flag. 

3. FCMOVcc instruction is supported if x87 and CMOVcc instructions are both supported. 

4. XSAVE (and related) instructions require separate enablement. 
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Appendix E Obtaining Processor Information Via 
the CPUID Instruction 


This appendix specifies the information that software can obtain about the processor on which it is 
running by executing the CPUID instruction. The information in this appendix supersedes the con- 
tents of the CPUID Specification, order #25481, which is now obsolete. 


The CPUID instruction is described on page 165. This appendix does not replace the CPUID 
instruction reference information presented there. 


The CPUID instruction behaves much like a function call. Parameters are passed to the instruction via 
registers and on execution the instruction loads specific registers with return values. These return 
values can be interpreted by software based on the field definitions and their assigned meanings. 


The first input parameter is the function number which is passed to the instruction via the EAX 
register. Some functions also accept a second input parameter passed via the ECX register. Values are 
returned via the EAX, EBX, ECX, and EDX registers. Software should not assume that any values 
written to these registers prior to the execution of CPUID instruction will be retained after the 
instruction executes (even those that are marked reserved). 


The description of each return value breaks the value down into one or more named fields which 
represent a bit position or contiguous range of bits. All bit positions that are not defined as fields are 
reserved. The value of bits within reserved ranges cannot be relied upon to be zero. Software must 
mask off all reserved bits in the return value prior to making any value comparisons of represented 
information. 


This appendix applies to all AMD processors with a family designation of OFh or greater. 


E.1 Special Notational Conventions 


The following special notation conventions are used in this appendix: 


e The notation (standard throughout this APM) for representing the function number, optional input 
parmeter, and the information returned is as follows: 


CPUID Fnxxxx XXXX RRR[FieldName]_xYYY. 
Where: 


- XXXX_XXXX is the function number represented in hexadecimal (passed to the instruction in 
EAX). 

- RRRis one of {EDX, ECX, EBX, EAX} and represents a register holding a return value. 

- YYY represents the optional input parameter passed in the ECX register expressed as a 


hexadecimal number. If this parameter is not used, the characters represented by _xYYY are 
ommitted from the notation. 
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- FieldName identifies a specific named element of processor information represented by a 
specific bit range (1 or more bits wide) within the RRR register. 


¢ The notation CPUID FnXXXX_XXXX _RRR is used when refering to one of the registers that holds 
information returned by the instruction. 


¢ The notation CPUID FnXxXxXX_XXXX or FnXXXX_XXXX is used to refer to a specific function 
number. 


¢ Most one-bit fields indicate support or non-support of a specific processor feature. By convention, 
(unless otherwise noted) a value of 1 means that the feature is supported by the processor and a 
value of 0 means that the feature is not supported by the processor. 


E.2 Standard and Extended Function Numbers 


The CPUID instruction supports two sets or ranges of function numbers: standard and extended. 


e The smallest function number of the standard function range is Fn0000_0000. The largest function 
number of the standard function range, for a particular implementation, is returned in CPUID 
Fn0000_0000_EAX. 


¢ The smallest function number of the extended function range is Fn8000_0000. The largest 
function number of the extended function range, for a particular implementation, is returned in 
CPUID Fn8000_0000_EAX. 

E.3 Standard Feature Function Numbers 

This section describes each of the defined CPUID functions in the standard range. 

E.3.1 Function 0h—Maximum Standard Function Number and Vendor String 


This function number provides information about the maximum standard function number supported 
on this processor and a string that identifies the vendor of the product. 


CPUID Fn0000_0000_EAX Largest Standard Function Number 


The value returned in EAX provides the largest standard function number supported by this processor. 


Bits |Field Name Description 


Largest standard function. The largest CPUID standard function input value 
supported by the processor implementation. 


31:0 |LFuncStd 


CPUID Fn0000_0000_E[D,C,B]X Processor Vendor 


The values returned in EBX, EDX, and ECX together provide a 12-character string identifying the 
vendor of this processor. Each register supplies 4 characters. The leftmost character of each substring 
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is stored in the least significant bit position in the register. The string is the concatenation of the 
contents of EBX, EDX, and ECX in left to right order. No null terminator is included in the string. 


CPUID Fn8000_0000_E/D,C,B]X return the same values as this function. 


Bits |Field Name Description 
Four characters of the 12-byte character string (encoded in ASCII) 


“AuthenticAMD”. See Table E-1 below. 


31:0 |Vendor 


Table E-1. CPUID Fn0000_0000_E[D,C,B]X values 


Register Value Description 
CPUID Fn0000_0000_EBX 6874_7541h |The ASCII characters “ht uA”. 
CPUID Fn0000_0000_ECX 444D_4163h |The ASCII characters “DM Ac”. 
CPUID Fn0000_0000_EDX 6974_6E65h |The ASCII characters “i tn e”. 


E.3.2 Function 1h—Processor and Processor Feature Identifiers 


This function number identifies the processor family, model, and stepping and provides feature 
support information. 


CPUID Fn0000_0001_EAX Family, Model, Stepping Identifiers 


The value returned in EAX provides the family, model, and stepping identifiers. Three values are used 
by software to identify a processor: Family, Model, and Stepping. 


Bits |Field Name 


Description 


31:28) — 


Reserved. 


27:20 |ExtFamily 


Processor extended family. See above for definition of Family[7:0]. 


19:16 | ExtModel 


Processor extended model. See above for definition of Modell[7:0]. 


15:12 | — 


Reserved. 


11:8 |BaseFamily 


Base processor family. See above for definition of Family[7:0]. 


7:4. |BaseModel 


Base processor model. See above for definition of Modell[7:0]. 


3:0 |Stepping 


Processor stepping. Processor stepping (revision) for a specific model. 


The processor Family identifies one or more processors as belonging to a group that possesses some 
common definition for software or hardware purposes. The Model specifies one instance of a 
processor family. The Stepping identifies a particular version of a specific model. Therefore, Family, 
Model and Stepping, when taken together, form a unique identification or signature for a processor. 


The Family is an 8-bit value and is defined as: Family[7:0] = ({0000b,BaseFamily[3:0]} + 
ExtFamily[7:0]). For example, if BaseFamily[3:0] = Fh and ExtFamily[7:0] = 01h, then Family[7:0] = 
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10h. If BaseFamily[3:0] is less than Fh, then ExtFamily is reserved and Family is equal to 
BaseFamily[3:0]. 


Model is an 8-bit value and is defined as: Model[7:0] = {ExtModel[3:0],BaseModel[3:0]}. For 
example, if ExtModel[3:0] = Eh and BaseModel[3:0] = 8h, then Model[7:0] = E8h. If BaseFamily[3:0] 
is less than OFh, then ExtModel is reserved and Model is equal to BaseModel[3:0]. 


The value returned by CPUID Fn8000_0001_EAX is equivalent to CPUID Fn0000_0001_EAX. 


CPUID Fn0000_0001_EBX LocalApicld, LogicalProcessorCount, CLFlush 


The value returned in EBX provides miscellaneous information regarding the processor brand, the 
number of logical threads per processor socket, the CLFLUSH instruction, and APIC. 


Bits |Field Name Description 
Initial local APIC physical ID. The 8-bit value assigned to the local APIC physical ID 
31:24 |LocalApicid register at power-up. Some of the bits of LocalApicld represent the core within a 
processor and other bits represent the processor ID. See the APIC20 “APIC ID 
register in the processor BKDG or PPR for details. 
Logical processor count. 
LoaicalP If CPUID Fn0000_0001_EDX[HTT] = 1 then LogicalProcessorCount is the number 
23:16 eae eaciaiaaa| 1| logic processors per package. 
If CPUID Fn0000_0001_ EDX[HTT] = 0 then LogicalProcessorCount is reserved. 
See E.5.1 [Legacy Method]. 
15:8 |CLElush CLFLUSH size. Specifies the size of a cache line in quadwords flushed by the 
, CLFLUSH instruction. See “CLFLUSH” in APM3. 
8-bit brand ID. This field, in conjunction with CPUID Fn8000_0001_EBX[Brandld], 
; is used by the system firmware to generate the processor name string. See the 
7:0 |8BitBrandld es 
appropriate processor revision guide for how to program the processor name 
string. 


CPUID Fn0000_0001 _ECX Feature Identifiers 


The value returned in ECX contains the following miscellaneous feature identifiers: 


Bits |Field Name Description 
31 |— RAZ. Reserved for use by hypervisor to indicate guest status. 
30 |RDRAND RDRAND instruction support. 
99 |E16C Half-precision convert instruction support. See “Half-Precision Floating-Point 
Conversion" in APM1 and listings for individual F16C instructions in APM5. 
28 |AVX AVX instruction support. See APM4. 
27 |OSXSAVE XSAVE (and related) instructions are enabled. See “OSXSAVE” in APM2. . 
26 |XSAVE XSAVE (and related) instructions are supported by hardware. See 
“XSAVE/XRSTOR Instructions” in APM2. 
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Bits |Field Name Description 
25 |AES AES instruction support. See “AES Instructions” in APM4. 
24 |— Reserved. 
23 |POPCNT POPCNT instruction. See “POPCNT” in APM3. 
22 MOVBE: MOVBE instruction support. 
21 |— Reserved. 
20 |ssea2 SSE4.2 instruction support. "Determining Media and x87 Feature Support" in 
APM2 and individual SSE4.2 instruction listings in APM4. 
19 |SSE41 SSE4.1 instruction support. See individual instruction listings in APM4. . 
18:14 | — Reserved. 
13  |CMPXCHG16B = |CMPXCHG16B instruction support. See “CMPXCHG16B” in APM3. 
12 |FMA FMA instruction support. 
11:10) — Reserved. 
9 |SSSE3 Supplemental SSE3 instruction support. 
8:4 | — Reserved. 
3 |MONITOR MONITOR/MWAIT instructions. See “MONITOR” and “MWAIT” in APM3. 
2 |— Reserved. 
1 JPCLMULQDQ —BcrMULODO / VECLMULQDO instruction INAPMA, 
SSE3 instruction support. See Appendix D “Instruction Subsets and CPUID 
0 |SSE3 Feature Sets” in APM3 for the list of instructions covered by the SSE3 feature bit. 
See APM4 for the definition of the SSE3 instructions. 


CPUID Fn0000_ 0001 EDX Feature Identifiers 


The value returned in EDX contains the following miscellaneous feature identifiers: 


Bits |Field Name Description 
31:29 | — Reserved. 
Hyper-threading technology. Indicates either that there is more than one thread per 
28 |HTT . i 
core or more than one core per compute unit.See “Legacy Method” on page 633. 
27 |— Reserved. 
26 |SSE2 SSE2 instruction support. See Appendix D “CPUID Feature Sets” in APM3. 
95 |SSE SSE instruction support. See Appendix D “CPUID Feature Sets” in APM3 appendix 
and “64-Bit Media Programming” in APM1. 
24 |FXSR FXSAVE and FXRSTOR instructions. See “FXSAVE” and “FXRSTOR?” in APM5. 
23 |MMX MMX™ instructions. See Appendix D “CPUID Feature Sets” in APM3 and “128-Bit 
Media and Scientific Programming” in APM1. 
22:20 | — Reserved. 
19 |CLFSH CLFLUSH instruction support. See “CLFLUSH” in APM3. 
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18 | — Reserved. 
17 |PSE36 Page-size extensions. The PDE[20:13] supplies physical address [39:32]. See 
“Page Translation and Protection” in APM2. 
16 |PAT Page attribute table. See “Page-Attribute Table Mechanism” in APM2. 
15 |CMOV Conditional move instructions. See “CMOV”, “FCMOV” in APM3. 
14 |MCA Machine check architecture. See “Machine Check Mechanism” in APM2. 
13 |PGE Page global extension. See “Page Translation and Protection” in APM2. 
12 |MTRR Memory-type range registers. See “Page Translation and Protection” in APM2. 
11 |SysEnterSysExit |SYSENTER and SYSEXIT instructions. See “SYSENTER’, “SYSEXIT“ in APM3. 
10 | — Reserved. 
9 |lapic Avanced programmable interrupt controller. Indicates APIC exists and is enabled. 
See “Exceptions and Interrupts” in APM2. 
8 |CMPXCHG8B CMPXCHG8B instruction. See “CMPXCHG8B” in APM3. 
7 |MCE Machine check exception. See “Machine Check Mechanism” in APM2. 
Physical-address extensions. Indicates support for physical addresses 3 32b. 
6 |PAE Number of physical address bits above 32b is implementation specific. See “Page 
Translation and Protection” in APM2. 
AMD model-specific registers. Indicates support for AMD model-specific registers 
5 |MSR (MSRs), with RDMSR and WRMSR instructions. See “Model Specific Registers” in 
APM2. 
Time stamp counter. RDTSC and RDTSCP instruction support. See “Debug and 
4 |TSC ia 
Performance Resources” in APM2. 
3 |PSE Page-size extensions. See “Page Translation and Protection” in APM2. 
2 |DE Debugging extensions. See “Debug and Performance Resources” in APM2. 
Virtual-mode enhancements. CR4.VME, CR4.PVI, software interrupt indirection, 
1 |VME expansion of the TSS with the software, indirection bitmap, EFLAGS.VIF, 
EFLAGS.VIP. See “System Resources” in APM2. 
QO |FPU x87 floating point unit on-chip. See “x87 Floating Point Programming” in APM1. 


E.3.3 Functions 2h—-4h—Reserved 


CPUID Fn0000_000[4:2] Reserved 


These function numbers are reserved. 


E.3.4 Function 5h—Monitor and MWait Features 


This function provides feature identifiers for the MONITOR and MWAIT instructions. For more 
information see the description of the MONITOR instruction on page 414 and the MWAIT instruction 
on page 420. 


598 


AMD PUBIE CSET Via the CPUID Instruction 


AMD¢\ 
24594—Rev. 3.33—November 2021 AMD64 Technology 


CPUID Fn0000_0005_EAX Monitor/MWait 


The value returned in EAX provides the following information: 


Bits |Field Name Description 
31:16 | — Reserved. 
15:0 |MonLineSizeMin |Smallest monitor-line size in bytes. 


CPUID Fn0000_0005_EBX Monitor/MWait 


The value returned in EBX provides the following information: 


Bits |Field Name Description 
31:16 | — Reserved. 
15:0 |MonLineSizeMax |Largest monitor-line size in bytes. 


CPUID Fn0000_0005_ECX Monitor/MWait 


The value returned in ECX provides the following information: 


Bits |Field Name Description 
31:2 | — Reserved. 
1 lIBE Interrupt break-event. Indicates MWAIT can use ECxX bit 0 to allow interrupts to 


cause an exit from the monitor event pending state, even if EFLAGS.IF=0. 


Enumerate MONITOR/MWAIT extensions: Indicates enumeration 
MONITOR/MWAIT extensions are supported. 


CPUID Fn0000_0005_EDX Monitor/MWait 


The value returned in EDX is undefined and is reserved. 


E.3.5 Function 6h—Power Management Related Features 


This function provides information about the local APIC timer timebase and the effective frequency 
interface for the processor. 


CPUID Fn0000_0006_EAX Local APIC Timer Invariance 


The value returned in EAX is undefined and is reserved. 


Obtaining Processor Information VANES Pine tree) 599 


AMDd¢1 


AMD64 Technology 24594—Rev. 3.33—November 2021 
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31:3 | — Reserved. 
If set, indicates that the timebase for the local APIC timer is not affected by 
2 |ARAT 
processor p-state. 
1:0 | — Reserved. 


CPUID Fn0000_0006_EBX Reserved 


The value returned in EBX is undefined and is reserved. 


CPUID Fn0000_0006_ECX Effective Processor Frequency Interface 


The value returned in ECX indicates support of the processor effective frequency interface. For more 
information on this feature, see "Determining Processor Effective Frequency" in APM2. 


Bits |Field Name Description 
31:1 | — Reserved. 
o lefcre Effective frequency interface support. If set, indicates presence of MSROOOO_O0E7 
q (MPERF) and MSROOOO_00E8 (APERF). 


CPUID Fn0000_0006_EDX Reserved 


The value returned in EDX is undefined and is reserved. 


E.3.6 Function 7h—Structured Extended Feature Identifiers 


CPUID Fn0000_0007_EAX_x0 Structured Extended Feature Identifiers (ECX=0) 


Bits |Field Name Description 


31:0 |/MaxSubFn Returns the number of subfunctions supported. 


CPUID Fn0000_0007_EBX_x0 Structured Extended Feature Identifiers (ECX=0) 


Bits |Field Name Description 
31:30 | — Reserved. 
29 |SHA Secure Hash Algorithm instruction extension. 
28:25 | — Reserved. 
24 |CLWB CLWB instruction support. 
23 |CLFLUSHOPT CLFLUSHOPT instruction support. 
22 |RDPID RDPID instruction and TSC_AUX MSR support. 
21 |— Reserved. 
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Bits |Field Name Description 
20 |SMAP Supervisor mode access prevention. 
19 |ADX ADCX, ADOX instruction support. 
18 |RDSEED RDSEED instruction support. 
17:11) — Reserved. 
10 | INVPCID INVPCID instruction support. 
9 |— Reserved. 
8 |BMI2 Bit manipulation group 2 instruction support. 
7 |SMEP Supervisor mode execution prevention. 
6 |— Reserved. 
5 |AVX2 AVX2 instruction subset support. 
4 |— Reserved. 
3 |BMI1 Bit manipulation group 1 instruction support. 
2:1 |— Reserved. 
0 |FSGSBASE FS and GS base read/write instruction support. 


CPUID Fn0000_0007_ECX_x0 Structured Extended Feature Identifiers (ECX=0) 


Bits |Field Name Description 
31:11 | — Reserved. 
10 | VPCMULQDQ Support for VPCLMULQDQ 256-bit instruction. 
9 | VAES Support for VAES 256-bit instructions. 
8 | — Reserved. 
7 |CET_SS Shadow Stacks supported. 
6:5 | — Reserved. 
4 | OSPKE OS has enabled Memory Protection Keys and use of the RDPKRU/WRPKRU 
instructions by setting CR4.PKE=1. 
3 | PKU Memory Protection Keys supported. 
2 | UMIP User mode instruction prevention support. 
1:0 | — Reserved. 


CPUID Fn0000_0007_EDX_x0 Structured Extended Feature Identifiers (ECX=0) 


Bits 


Field Name 


Description 


31:0 


Reserved. 


E.3.7 Functions 8h-Ah—Reserved 
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E.3.8 Function Bh — Extended Topology Enumeration 


CPUID Fn0000_000B enumerates each level in the processor’s topological hierarchy. The level 
number is specified by the input value passed in the ECX register. 


If this function is executed with an unimplemented level (passed in ECX), the instruction returns all 
zeros in the EAX register. 


Subfunction 0 of Fn0000_000B - Thread Level 


Subfunction 0 provides information about the thread-level topology. 


CPUID Fn0000_000B_EAX_x0 Extended Topology Enumeration (ECX=0) 


Bits |Field Name Description 
31:5 | — Reserved. 
4:0 | ThreadMaskWidth |Number of bits to shift x2APIC_ID right to get to the topology ID of the next level 


CPUID Fn0000_000B_EBX_x0 Extended Topology Enumeration (ECX=0) 


Bits |Field Name Description 
31:16 |— Reserved. 
15:0 Number of threads in a core 


CPUID Fn0000_000B_ECX_x0 Extended Topology Enumeration (ECX=0) 


Bits |Field Name Description 

31:16 | — Reserved. 

15:8 | level number returns ‘1’ indicating thread level 
7:0 | ECX input value returns ‘0’ 


CPUID Fn0000_000B_EDX_x0 Extended Topology Enumeration (ECX=0) 


Bits |Field Name Description 
31:0 | x2APIC_ID 32-bit Extended APIC_ID 
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Subfunction 1 of Fn0000_000B - Core Level 


Subfunction 1 provides information about the core-level topology. 


CPUID Fn0000_000B_EAX_x1 Extended Topology Enumeration (ECX=1) 


Bits |Field Name Description 
31:5 | — Reserved. 
4:0 | CoreMaskWidth Number of bits to shift x2APIC_ID right to get to the topology ID of the next level 


CPUID Fn0000_000B_EBX_x1 Extended Topology Enumeration (ECX=1) 


Bits |Field Name Description 
31:16 | — Reserved. 
15:0 Number of logical cores in socket 


CPUID Fn0000_000B_ECX_x1 Extended Topology Enumeration (ECX=1) 


Bits |Field Name Description 

31:16 |— Reserved. 

15:8 |level numbers returns ‘2’, indicating core-level 
7:0 |ECX input value returns ‘1’ 


CPUID Fn0000_000B_EDX_x1 Extended Topology Enumeration (ECX=1) 


Bits 


Field Name 


Description 


31:0 


x2APIC_ID 


32-bit Extended APIC_ID 


E.3.9 Function Ch—Reserved 


E.3.10 Function Dh—Processor Extended State Enumeration 


The XSAVE / XRSTOR instructions are used to save and restore x87/MMX FPU and SSE processor 
state. These instructions allow processor state associated with specific architected features to be 
selectively saved and restored. This function provides information about extended state support and 
save area size requirements. 


The function has a number of subfunctions specified by the input value passed to the CPUID 
instruction in the ECX register. If CPUID Fn0000_000D is executed with an unimplemented 
subfunction (passed in ECX), the instruction returns all zeros in the EAX, EBX, ECX, and EDX 
registers. 
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Subfunction 0 of Fn0000 000D 


Subfunction 0 provides information about features within the extended processor state management 
architecture that are supported by the processor. 


CPUID Fn0000_000D_EAX_x0 Processor Extended State Enumeration (ECX=0) 


The value returned in EAX provides a bit mask specifying which of the features defined by the 
extended processor state architecture are supported by the processor. 


Bits |Field Name Description 


Reports the valid bit positions for the lower 32 bits of the 
31:0 )XFeatureSupportedMask[31:0] |XFeatureEnabledMask register. If a bit is set, the corresponding 
feature is supported. See “XSAVE/XRSTOR Instructions” in APM2. 


CPUID Fn0000_000D_EBX_x0 Processor Extended State Enumeration (ECX=0) 


The value returned in EBX gives the save area size requirement in bytes based on the features 
currently enabled in the XFEATURE_ENABLED_MASK (XCR0O). 


Bits |Field Name Description 
31-0 |XFeatureEnabledSizeMax ae bytes of XSAVE/XRSTOR area for the currently enabled features in 


CPUID Fn0000_000D_ECX_x0 Processor Extended State Enumeration (ECX=0) 


The value returned in ECX gives the save area size requirement in bytes for all extended state 
management features supported by the processor (whether enabled or not). 


Bits |Field Name Description 


Size in bytes of XSAVE/XRSTOR area for all features that the logical 


210.) AEC ate SUppOnen ize ox processor supports. See XFeatureEnabledSizeMax. 


CPUID Fn0000_000D_EDX_x0 Processor Extended State Enumeration (ECX=0) 


The value returned in EDX provides a bit mask specifying which of the features defined by the 
extended processor state architecture are supported by the processor. 


Bits |Field Name Description 


Reports the valid bit positions for the upper 32 bits of the 
31:0 |XFeatureSupportedMask[63:32] |XFeatureEnabledMask register. If a bit is set, the corresponding 
feature is supported. 
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See “XSAVE/XRSTOR Instructions” in APM2 and reference pages for the individual instructions in 
APM4. 


Subfunction 1 of Fn0000 000D 


Subfunction 1 provides additional information about features within the extended processor state 
management architecture that are supported by the processor. 


CPUID Fn0000_000D_EAX_x1 Processor Extended State Enumeration (ECX=1) 


Bits |Field Name Description 

31:4 Reserved. 
3 |XSAVES XSAVES, XRSTOR, and XSS are supported. 
2 |XGETBV XGETBV with ECX = 1 supported. 
1 |XSAVEC XSAVEC and compact XRSTOR supported. 
O |XSAVEOPT XSAVEOPT is available. 


CPUID Fn0000_000D_EBX_x1 Processor Extended State Enumeration (ECX=1) 


The value returned on EBX represents the fixed size of the save area (240h) plus the state size of each 
enabled extended feature: 


EBX = 0240h 

+ ((XCRO[AVX] == 1) ? 0000_0100h : 0) 
+ ((XCRO[MPK] == 1) ? 0000_0008h : 0) 
+ ((XSS[CET_U] == 1) ? 0000_0010h : 0) 
+ ((XSS[CET_S] == 1) ? 0000_0018h : 0) 


CPUID Fn0000_000D_ECX_x1 Processor Extended State Enumeration (ECX=1) 


The value returned on ECX returns a "1" for each bit that is settable in the XSS MSR. The following 
bits are defined: 


Bits |Field Name Description 
31:13) — Reserved. 
12 |CET_S CET supervisor. 
11 |CET_U CET user state. 
10:0 | — Reserved 


CPUID Fn0000_000D_EDX_x1 Processor Extended State Enumeration (ECX=1) 
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The value returned in EDX for subfunction 1 is undefined and reserved. 


Subfunction 2 of Fn0000 000D 


Subfunction 2 provides information about the size and offset of the 256-bit SSE vector floating point 
processor unit state save area. 


CPUID Fn0000_000D_EAX_x2 Processor Extended State Enumeration (ECX=2) 


The value returned in EAX provides information about the size of the 256-bit SSE vector floating 
point processor unit state save area. 


Bits |Field Name Description 


31:0 )YmmSaveStateSize |YMM state save size. The state save area size in bytes for The YMM registers. 


CPUID Fn0000_000D_EBX_x2 Processor Extended State Enumeration (ECX=2) 


The value returned in EBX provides information about the offset of the 256-bit SSE vector floating 
point processor unit state save area from the base of the extended state (XSAVE/XRSTOR) save area. 


Bits |Field Name Description 


YMM state save offset. The offset in bytes from the base of the extended state 


SLD Meni Save Stale Ors save area of the YMM register state save area. 


CPUID Fn0000_000D_E[D,C]X_x2 Processor Extended State Enumeration (ECX=2) 


The values returned in ECX and EDX for subfunction 2 are undefined and are reserved. 


Subfunction 11 of Fn0000 000D 


Subfunction 11 provides information about the CET user state save area. 


CPUID Fn0000_000D_E[A, B, C, D]X_x11 Processor Extended State Emulation (ECX=11) 


The value returned in EAX, EBX, ECX and EDX provides information about the CET user state save 
area. 


Register |Bits |Field Name Description 
EAX {31:0 |CetUserSize CET user state save size in bytes 
EBX {31:0 |CetUserOffset CET user state offset from the base of the extended state save area 
ECX |0 U/S Set to '1', indicating a supervisor state component 
ECX /31:0 | — Cleared to 0 
EDX /31:0 | — Unused, cleared to 0 
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Subfunction 12 of Fn0000 000D 


Subfunction 12 provides information about the CET supervisor state save area. 


CPUID Fn0000_000D_E[A, B, C, D]X_x12 Processor Extended State Emulation (ECX=12) 


The value returned in EAX, EBX, ECX and EDX provides information about the CET supervisor state 
save area. 


Register |Bits |Field Name Description 


EAX {31:0 |CetSupervisorSize CET supervisor state save size in bytes 


EBX 31:0 |CetSupervisorOffset CET supervisor state offset from the base of the extended state save 


area 

ECX |0 U/S Set to '1', indicating a supervisor state component 
ECX /31:0 | — Cleared to 0 

EDX /31:0 | — Unused, cleared to 0 


Subfunction 3Eh of Fn0000 000D 


Subfunction 3Eh provides information about the size and offset of the Lightweight Profiling (LWP) 
unit state save area. 


CPUID Fn0000_000D_EAX_x3E Processor Extended State Enumeration (ECX=62) 


The value returned in EAX provides the size of the Lightweight Profiling (LWP) unit state save area. 


Bits |Field Name Description 


LWP state save area size. The size of the save area for LWP state in bytes. See 


31:0 |LwpSavestateSize |. ightweight Profiling” in APM2. 


CPUID Fn0000_000D_EBX_x3E Processor Extended State Enumeration (ECX=62) 


The value returned in EBX provides the offset of the Lightweight Profiling (LWP) unit state save area 
from the base of the extended state (XSAVE/XRSTOR) save area. 
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Bits |Field Name Description 


LWP state save byte offset. The offset in bytes from the base of the extended 
31:0 |LwpSaveStateOffset |state save area of the state save area for LWP. See “Lightweight Profiling” in 
APM2. 


CPUID Fn0000_000D_E[D,C]X_x3E Processor Extended State Enumeration (ECX=62) 


The values returned in ECX and EDX for subfunction 3Eh are undefined and are reserved. 


Subfunctions of Fn0000_000D greater than 3Eh 


For CPUID Fn0000_000D, if the subfunction (specified by contents of ECX) passed as input to the 
instruction is greater than 3Eh, the instruction returns zero in the EAX, EBX, ECX, and EDX registers. 


E.3.11 Functions 4000_0000h-4000_FFh—Reserved for Hypervisor Use 
CPUID Fn4000_O00[FF:00] Reserved 


These function numbers are reserved for use by the virtual machine monitor. 


E.4 Extended Feature Function Numbers 
This section describes each of the defined CPUID functions in the extended range. 


E.4.1 Function 8000 0000h—Maximum Extended Function Number and Vendor 
String 


This function provides information about the maximum extended function number supported on this 
processor and a string that identifies the vendor of the product. 


CPUID Fn8000_0000_EAX Largest Extended Function Number 


The value returned in EAX provides the largest extended function number supported by the processor. 


Bits |Field Name Description 


Largest extended function. The largest CPUID extended function input value 


SLO Erunees supported by the processor implementation. 


CPUID Fn8000_0000_E[D,C,B]X Processor Vendor 


The values returned in EBX, ECX, and EDX together provide a 12-character string identifying the 
vendor of this processor. The output string is the same as the one returned by Fn0000_0000. See 
CPUID Fn0000_0000_E[D,C,B]X on page 594 for more details. 
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Bits |Field Name Description 


Four characters of the 12-byte character string (encoded in ASCII) 


S10" Vendor “AuthenticAMD”. See Table E-2 below. 


Table E-2. CPUID Fn8000_0000_E[D,C,B]X values 


Register Value Description 
CPUID Fn8000_0000_EBX 6874_7541h |The ASCII characters “ht uA”. 
CPUID Fn8000_0000_ECX 444D_4163h |The ASCII characters “DM Ac”. 
CPUID Fn8000_0000_EDX 6974_6E65h |The ASCII characters “i tn e”. 


E.4.2 Function 8000 0001h—Extended Processor and Processor Feature Identifiers 
CPUID Fn8000_0001_EAX AMD Family, Model, Stepping 


The value returned in EAX provides the family, model, and stepping identifiers. Three values are used 
by software to identify a processor: Family, Model, and Stepping. The value returned in EAX is the 
same as the value returned in EAX for Fn0000_0001. See CPUID Fn0000_0001_EAX on page 595 
for more details on the field definitions. 


Bits |Field Names Description 
31:0 )Family, Model, Stepping |See: CPUID Fn0000_0001_EAX. 


CPUID Fn8s000_0001 EBX Brandld Identifier 


The value returned in EBX provides package type and a 16-bit processor name string identifiers. 


Bits |Field Name Description 


Package type. If (Family[7:0] >= 10h), this field is valid. If (Family[7:0]<10h), this 
field is reserved. 


31:28 |PkgType 


27:16 | — Reserved. 
Brand ID. This field, in conjunction with CPUID Fn0000_0001_EBX[8BitBrandld], is 
15:0 |Brandid used by system firmware to generate the processor name string. See your 


processor revision guide for how to program the processor name string. 


For processor families 10h and greater, PkgType is described in the BIOS and Kernel Developer’s 
Guide for the product. 


CPUID Fn8s000_ 0001 ECX Feature Identifiers 


This function contains the following miscellaneous feature identifiers: 
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31 | — Reserved. 
30 |AddrMaskExt Breakpoint Addressing masking extended to bit 31. 
29 |MONITORX Support for MWAITX and MONITOR < instructions. 
28 |PerfCtrExtLLC Support for L3 performance counter extension. 
97 |Perftsc Performance time-stamp counter. Indicates support for MSRCO01_0280 
[Performance Time Stamp Counter]. 
26 |DataBkptext eee extension. Indicates support for MSRC001_1027 and 
25 | — Reserved 
Cet oe a 
23 |PerfCrextCore [irgrcoo!020(A,8,64,2,0] and MSRCOO1,020129,75.31) 
22 TopologyExtensio |Topology extensions support. Indicates support for CPUID 
ns Fn8000_001D_EAX_x[N:0]-CPUID Fn8000_001E_EDX. 
21 |TBM Trailing bit manipulation instruction support. 
20 | — Reserved. 
19 |;— Reserved. 
18 | — Reserved. 
17 | TCE Translation Cache Extension support. 
16 |FMA4 Four-operand FMA instruction support. 
15 |Lwp Lightweight profiling support. See “Lightweight Profiling” in APM2 and reference 
pages for individual LWP instructions in APM3. 
14 );— Reserved. 
13 |wpT Watchdog timer support. See APM2 and APM3. Indicates support for 
MSRCO01_0074. 
12 |SKINIT SKINIT and STGI are supported. Indicates support for SKINIT and STGI, 
independent of the value of MSRCOOO_OO080[SVME]. See APM2 and APM3. 
11 |XOP Extended operation support. 
10 |IBS Instruction based sampling. See “Instruction Based Sampling” in APM2. 
9 losvw OS visible workaround. Indicates OS-visible workaround support. See “OS Visible 
Work-around (OSVW) Information” in APM2. 
8 |3DNowPrefetch hail tied gen a instruction support. See “PREFETCH” and 
7 |MisAlignSse ena ints ore See “Misaligned Access Support Added for SSE 
6 |SSE4A EXTRQ, INSERTQ, MOVNTSS, and MOVNTSD instruction support. See 
“EXTRQ”, “INSERTQ”, “MOVNTSS”, and “MOVNTSD” in APM4. 
ABM Advanced bit manipulation. LZCNT instruction support. See “LZCNT” in APM3. 
AltMovCr8 LOCK MOV CRO means MOV CR8. See “MOV(CRn)” in APM3. 
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Bits |Field Name Description 

Extended APIC space. This bit indicates the presence of extended APIC register 

3 |ExtApicSpace space starting at offset 400h from the “APIC Base Address Register,” as specified 
in the BKDG. 

2 |SVM Secure virtual machine. See “Secure Virtual Machine” in APM2. 

1 |CmpLegacy Core multi-processing legacy mode. See “Legacy Method” on page 633. 

0 |LahtSahf pane and SAHF instruction support in 64-bit mode. See “LAHF” and “SAHF” in 


CPUID Fn8s000_ 0001 EDX Feature Identifiers 


This function contains the following miscellaneous feature identifiers: 


Bits |Field Name Description 

31 |spNow aac instructions. See Appendix D “Instruction Subsets and CPUID Feature 
Sets” in APM3. 
AMD extensions to 3DNow! instructions. See Appendix D “Instruction Subsets and 

oe ieee CPUID Feature Sets” in APM3. 7 

29 |LM Long mode. See “Processor Initialization and Long-Mode Activation” in APM2. 

28 | — Reserved. 

27 |RDTSCP RDTSCP instruction. See “RDTSCP” in APM3. 

26 |Page1GB 1-GB large page support. See “1-GB Paging Support” in APM2. 

o5 |FEXSR FXSAVE and FXRSTOR instruction optimizations. See “FXSAVE” and “FXRSTOR” 
in APM5. 

24 |FXSR FXSAVE and FXRSTOR instructions. Same as CPUID Fn0000_0001_EDX[FXSR]. 

23 |MMX MMX™ instructions. Same as CPUID Fn0000_0001_EDX[MMX]. 
AMD extensions to MMX instructions. See Appendix D “Instruction Subsets and 

22 |MmxExt CPUID Feature Sets” in APM3 and “128-Bit Media and Scientific Programming” in 
APM1. 

21 |— Reserved. 

20 |NX No-execute page protection. See “Page Translation and Protection” in APM2. 

19:18 | — Reserved. 

17 |PSE36 Page-size extensions. Same as CPUID Fn0000_0001_EDX[PSE36]. 

16 |PAT Page attribute table. Same as CPUID Fn0000_0001_EDX[PAT]. 

15 |CMOV Conditional move instructions. Same as CPUID FnO0000_0001_EDX[CMOV]. 

14 |MCA Machine check architecture. Same as CPUID Fn0000_0001_EDX[MCA]. 

13 |PGE Page global extension. Same as CPUID Fn0000_0001_EDX[PGE]. 

12 |MTRR Memory-type range registers. Same as CPUID Fn0000_0001_EDX[MTRR]. 

11 |SysCallSysRet SYSCALL and SYSRET instructions. See “SYSCALL” and “SYSRET” in APM3. 

10 | — Reserved. 
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Bits |Field Name Description 
9 |APIC Advanced programmable interrupt controller. Same as CPUID 
Fn0000_0001_EDX[APIC]. 
8 |CMPXCHG8B CMPXCHG8B instruction. Same as CPUID Fn0000_0001_EDX[CMPXCHG8B]. 
7 |MCE Machine check exception. Same as CPUID Fn0000_0001_EDX[MCE]. 
6 |PAE Physical-address extensions. Same as CPUID Fn0000_0001_EDX[PAE]. 
5 |MSR AMD model-specific registers. Same as CPUID Fn0000_0001_EDX[MSR]. 
4 |TSC Time stamp counter. Same as CPUID Fn0000_0001_EDX[TSC]. 
3 |PSE Page-size extensions. Same as CPUID Fn0000_0001_EDX[PSE]. 
2 |DE Debugging extensions. Same as CPUID Fn0000_0001_EDX[DE]. 
1 |VME Virtual-mode enhancements. Same as CPUID FnO0000_0001_EDX[VME]. 
QO |FPU x87 floating-point unit on-chip. Same as CPUID FnO0000_0001_EDX[FPU]. 


E.4.3 Functions 8000_0002h-8000_0004h—Extended Processor Name String 
CPUID Fn8000_000[4:2] E[D,C,B,A]X Processor Name String Identifier 


The three extended functions from Fn8000_0002 to Fn8000_0004 are programmed to return a null 
terminated ASCII string up to 48 characters in length corresponding to the processor name. 


Bits |Field Name Description 
31:0 )ProcName Four characters of the extended processor name string. 


The 48 character maximum includes the terminating null character. The 48 character string is ordered 
first to last (left to right) as follows: 

Fn8000_0002[EAX[7:0],..., EAX[31:24], EBX[7:0],..., EBX[31:24], ECX[7:0],..., 

ECX[31:24], EDX[7:0],..., EDX[31:24]], 

Fn8000_0003[EAX[7:0],..., EAX[31:24], EBX[7:0],..., EBX[31:24], ECX[7:0],..., ECX[31:24], 
EDX[7:0],..., EDX[31:24]], 

Fn8000_0004[/EAX[7:0],..., EAX[31:24], EBX[7:0],..., EBX[31:24], ECX[7:0],..., ECX[31:24], 
EDX[7:0],..., EDX[31:24]]. 


The extended processor name string is programmed by system firmware. See your processor revision 
guide for information about how to display the extended processor name string. 


E.4.4 Function 8000 0005h—L1 Cache and TLB Information 


This function provides first level cache TLB characteristics for the processor that executes the 
instruction. 


CPUID Fn8000_0005_EAX L1 TLB 2M/4M Information 


The value returned in EAX provides information about the L1 TLB for 2-MB and 4-MB pages. 
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Bits 


Field Name 


Description 


31:24 


L1IDTIb2and4MAssoc 


Data TLB associativity for 2-MB and 4-MB pages. Encoding is per Table E-3 
below. 


23:16 


L1IDTlb2and4MSize 


Data TLB number of entries for 2-MB and 4-MB pages. The value returned is 
for the number of entries available for the 2-MB page size; 4-MB pages require 
two 2-MB entries, so the number of entries available for the 4-MB page size is 
one-half the returned value. 


15:8 


L1ITIb2and4MAssoc 


Instruction TLB associativity for 2-MB and 4-MB pages. Encoding is per 
Table E-3 below. 


7:0 


L1iITIb2and4MSize 


Instruction TLB number of entries for 2-MB and 4-MB pages. The value 
returned is for the number of entries available for the 2-MB page size; 4-MB 
pages require two 2-MB entries, so the number of entries available for the 4-MB 
page size is one-half the returned value. 


The associativity fields (L1DTIb2and4MAssoc and L1ITIb2and4MAssoc) are encoded as follows: 


Table E-3. L1 Cache and TLB Associativity Field Encodings 


Associativity | Definition 
[7:0] 
00h Reserved 
Olh 1 way (direct mapped) 
02h-FEh __jn-way associative. (field encodes n) 
FFh Fully associative 


CPUID Fn8000_0005_EBX L1 TLB 4K Information 


The value returned in EBX provides information about the L1 TLB for 4-KB pages. 


Bits |Field Name Description 

31:24|L1DTIb4KAssoc /Data TLB associativity for 4 KB pages. Encoding is per Table E-3 above. 

23:16 |L1DTIb4KSize Data TLB number of entries for 4 KB pages. 

15:8 |L1ITIb4KAssoc Instruction TLB associativity for 4 KB pages. Encoding is per Table E-3 above. 
7:0 |LilTIb4KSize Instruction TLB number of entries for 4 KB pages. 


The associativity fields (L1DTIlb4KAssoc and L1ITIb4KAssoc) are encoded as specified in Table E-3 
on page 613. 


CPUID Fn8000_0005 _ECX L1 Data Cache Information 


The value returned in ECX provides information about the first level data cache. 
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31:24 |L1DcSize L1 data cache size in KB. 

23:16 |L1DcAssoc L1 data cache associativity. Encoding is per Table E-3. 
15:8 |L1DcLinesPerTag |L1 data cache lines per tag. 
7:0 |L1DcLineSize L1 data cache line size in bytes. 


The associativity field (L1DcAssoc) is encoded as specified in Table E-3 on page 613. 


CPUID Fn8000_0005 EDX L1 Instruction Cache Information 


The value returned in EDX provides information about the first level instruction cache. 


Bits |Field Name Description 
31:24 |L1lcSize L1 instruction cache size KB. 
23:16 |L1IcAssoc L1 instruction cache associativity. Encoding is per Table E-3. 
15:8 |LilcLinesPerTag /|L1 instruction cache lines per tag. 
7:0 |L1ilcLineSize L1 instruction cache line size in bytes. 


The associativity field (L1IcAssoc) is encoded as specified in Table E-3 on page 613. 


E.4.5 Function 8000 0006h—L2 Cache and TLB and L3 Cache Information 


This function provides the second level cache and TLB characteristics for the logical processor that 
executes the instruction. The EDX register returns the processor’s third level cache characteristics that 
are shared by all logical processors in the package. 


CPUID Fn8000_0006_EAX L2 TLB 2M/4M Information 


The value returned in EAX provides information about the L2 TLB for 2-MB and 4-MB pages. 


Bits |Field Name Description 

31:28 |L2DTIb2and4MAssoc L2 data TLB associativity for 2-MB and 4-MB pages. Encoding is per 
Table E-4 below. 
L2 data TLB number of entries for 2-MB and 4-MB pages. The value returned 

; : is for the number of entries available for the 2 MB page size; 4 MB pages 

GORE aete ores e. require two 2 MB entries, so the number of entries available for the 4 MB page 
size is one-half the returned value. 
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L2 instruction TLB associativity for 2-MB and 4-MB pages. Encoding is per 


15:12 |_L2ITIb2and4MAssoc Table EL helow. 


L2 instruction TLB number of entries for 2-MB and 4-MB pages. The value 
returned is for the number of entries available for the 2 MB page size; 4 MB 
pages require two 2 MB entries, so the number of entries available for the 4 
MB page size is one-half the returned value. 


11:0 |L2ITIb2and4MSize 


The associativity fields (L2DTIb2and4MAssoc and L2ITIb2and4MAssoc) are encoded as follows: 


Table E-4. L2/L3 Cache and TLB Associativity Field Encoding 


Associativity | Definition 
[3:0] 
Oh L2/L3 cache or TLB is disabled. 
th Direct mapped. 
2h 2-way associative. 
3h 3-way associative. 
4h 4-way associative. 
5h 6-way associative. 
6h 8-way associative. 
8h 16-way associative. 
9h Value for all fields should be determined from 
Fn8000_001D 
Ah 32-way associative. 
Bh 48-way associative. 
Ch 64-way associative. 
Dh 96-way associative. 
Eh 128-way associative. 
Fh Fully associative. 
All other encodings are reserved. 


CPUID Fn8000_0006_EBX L2 TLB 4K Information 


The value returned in EBX provides information about the L2 TLB for 4-KB pages. 


Bits |Field Name Description 

31:28|L2DTIb4KAssoc /|L2 data TLB associativity for 4-KB pages. Encoding is per Table E-4 above. 
27:16 |L2DTIb4KSize L2 data TLB number of entries for 4-KB pages. 

15:12 |L2ITIb4KAssoc L2 instruction TLB associativity for 4-KB pages. Encoding is per Table E-4 above. 
11:0 |L2ITIb4KSize L2 instruction TLB number of entries for 4-KB pages. 


The associativity fields (L2DTlb4KAssoc and L2ITIb4KAssoc) are encoded per Table E-4 above. 
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CPUID Fn8000_0006_ECX L2 Cache Information 


The value returned in ECX provides information about the L2 cache. 


Bits |Field Name Description 
31:16 |L2Size L2 cache size in KB. 
15:12 |L2Assoc L2 cache associativity. Encoding is per Table E-4 on page 615. 
11:8 |L2LinesPerTag L2 cache lines per tag. 
7:0 |L2LineSize L2 cache line size in bytes. 


The associativity field (L2Assoc) is encoded per Table E-4 on page 615. 


CPUID Fn8000_ 0006 EDX L3 Cache Information 


The value returned in EDX provides the third level cache characteristics shared by all logical 
processors in the package. 


Bits |Field Name Description 
31:18 | 3Size Specifies the L3 cache size range: . . 
(L3Size[31:18] * 512KB) < L3 cache size < ((L3Size[31:18]+1) * 512KB). 
17:16} — Reserved. 
15:12 |L3Assoc L3 cache associativity. Encoded per Table E-4 on page 615. 
11:8 |L3LinesPerTag L3 cache lines per tag. 
7:0 |L3LineSize L3 cache line size in bytes. 


The associativity field (L3Assoc) is encoded per Table E-4 on page 615. 


E.4.6 Function 8000_0007h—Processor Power Management and RAS Capabilities 


This function provides information about the power management, power reporting, and RAS 


capabilities of the processor that executes the instruction. There may be other processor-specific 
features and reporting capabilities not covered here. Refer to the BIOS and Kernel Developer’s Guide 


for your specific product to otain more information. 


CPUID Fn8000_0007_EAX Reserved 


Bits 


Field Name 


Description 


31:0 


Reserved. 


CPUID Fn8000_0007_EBX RAS Capabilities 
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The value returned in EBX provides information about RAS features that allow system software to 
detect specific hardware errors. 


Bits |Field Name Description 
31:4 | — Reserved. 


0=MCAX is not supported. 1=MCAX is supported; the MCAX MSR addresses are 
supported; MCA_CONFIG[Mcax] is present in all MCA banks. 


2 |HWA Hardware assert supported. Indicates support for MSRCO01_10[DF:CO]. 


3 |ScalableMca 


Software uncorrectable error containment and recovery capability. 

The processor supports software containment of uncorrectable errors through 
context synchronizing data poisoning and deferred error interrupts; see APM2, 
Chapter 9, “Determining Machine-Check Architecture Support.” 


MCA overflow recovery support. If set, indicates that MCA overflow conditions 
(MCi_STATUS[Overflow]=1) are not fatal; software may safely ignore such 
conditions. If clear, MCA overflow conditions require software to shut down the 
system. See APM2, Chapter 9, “Handling Machine Check Exceptions.” 


1 |SUCCOR 


0 |McaOverflowRecov 


CPUID Fn8000_0007_ECX Processor Power Monitoring Interface 


The value returned in ECX provides information about the implementation of the processor power 
monitoring interface. 


Bits |Field Name Description 
Specifies the ratio of the compute unit power accumulator sample 
31:0 |CpuPwrSampleTimeRatio period to the TSC counter period. Returns a value of 0 if not applicable 
for the system. 


CPUID Fn8000_0007_EDX Advanced Power Management Features 


The value returned in EDX provides information about the advanced power management and power 
reporting features available. Refer to the BIOS and Kernel Developer’s Guide for your specific product 
for a detailed description of the definition of each power management feature. 


Bits |Field Name Description 
31:13 | — Reserved. 
12 |ProcPowerReporting |Processor power reporting interface supported. 


Processor feedback interface. Value: 1. 1=Indicates support for processor 
feedback interface. Note: This feature is deprecated. 


Read-only effective frequency interface. 1=Indicates presence of 
MSRCO000_00E7 [Read-Only Max Performance Frequency Clock Count 


11 |ProcFeedbacklinterface 


a: JEREREGRO (MPerfReadOnly)] and MSRCOOO_OOE8 [Read-Only Actual Performance 
Frequency Clock Count (APerfReadOnly)]. 
9 |CPB Core performance boost. 
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TSC invariant. The TSC rate is ensured to be invariant across all P-States, C- 
States, and stop grant transitions (Such as STPCLK Throttling); therefore the 
TSC is suitable for use as a source of time. 0 = No such guarantee is made 
and software should avoid attempting to use the TSC as a source of time. 


Hardware P-state control. MSRCO01_0061 [P-state Current Limit], 


8 |Tsclnvariant 


te alate MSRCO001_0062 [P-state Control] and MSRC001_0063 [P-state Status] exist. 
6 |100MHzSteps 100 MHz multiplier Control. 

5 | — Reserved. 

4 |TM Hardware thermal control (HTC). 

3  /TTP THERMTRIP. 

2 |VID Voltage ID control. Function replaced by HwPstate. 

1 |FID Frequency ID control. Function replaced by HwPstate. 

0 |TS Temperature sensor. 


E.4.7 Function 8000_0008h—Processor Capacity Parameters and Extended Feature 
Identification 


This function provides the size or capacity of various architectural parameters that vary by 
implementation, as well as an extension to the Fn8000_0001 feature identifiers. 


CPUID Fn8000_0008_EAX Long Mode Size Identifiers 


The value returned in EAX provides information about the maximum host and guest physical and 
linear address width (in bits) supported by the processor. 


Bits |Field Name Description 
31:24 | — Reserved. 


Maximum guest physical address size in bits. This number applies only to guests 
using nested paging. When this field is zero, refer to the PhysAddrSize field for 
the maximum guest physical address size. See “Secure Virtual Machine” in 
APM2. 


15:8 |LinAddrSize Maximum linear address size in bits. 


23:16 |GuestPhysAdadrSize 


Maximum physical address size in bits. When GuestPhysAdadrSize is zero, this 


EO | PiyenddisiZe field also indicates the maximum guest physical address size. 


The address width reported is the maximum supported in any mode. For long mode capable proces- 
sors, the size reported is independent of whether long mode is enabled. See “Processor Initialization 
and Long-Mode Activation” in APM2. 


CPUID Fn8s000_0008_EBX Extended Feature Identifiers 


The value returned in EBX is an extension to the Fn8000_0001 feature flags and indicates the presence 
of various ISA extensions. 
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Bit |Field Name Description 

31:29 |— Reserved 

28 PSFD Predictive Store Forward Disable 

27 — Reserved 

26 SsbdNotRequired SSBD not needed on this processor 

25 SsbdVirtSpecCtrl Use VIRT_SPEC_CTL for SSBD 

24 SSBD Speculative Store Bypass Disable 

23:22 |— Reserved 

21 INVLPGBnestedPages _|INVLPGB support for invalidating guest nested translations 
20 EferLmsleUnsupported |EFER.LMSLE is unsupported. 

19 lbrsSameMode IBRS provides same mode speculation limits 
18 lbrsPreferred IBRS is preferred over software solution 

17 StibpAlwaysOn Processor prefers that STIBP be left on 

16 lbrsAlwaysOn Processor prefers that IBRS be left on 

15 STIBP Single Thread Indirect Branch Prediction mode 
14 IBRS Indirect Branch Restricted Speculation 

13 INT_WBINVD WBINVD/WBNOINVD are interruptible. 

12 IBPB Indirect Branch Prediction Barrier 

11:10 j— Reserved 

9 WBNOINVD WBNOINVD instruction supported 

8 MCOMMIT MCOMMIT instruction supported 

75 |— Reserved 

4 RDPRU RDPRU instruction supported 

3 INVLPGB INVLPGB and TLBSYNC instruction supported 
2 RstrFpErrPtrs FP Error Pointers Restored by XRSTOR 

1 InstRetCntMsr Instruction Retired Counter MSR available 

0 CLZERO CLZERO instruction supported 


CPUID Fn8000_0008_ECX Size Identifiers 


The value returned in ECX provides information about the number of cores supported by the 


processor, the width of the APIC ID, and the width of the performance time-stamp counter. 


Obtaining Processor Information VANES Pine tee) 


619 


AMDd¢1 


AMD64 Technology 24594—Rev. 3.33—November 2021 
Bits |Field Name Description 
31:18 | — Reserved. 


Performance time-stamp counter size. Indicates the size of 
MSRCO001_0280[PTSC]. 


Bits Description 


17:16 |PerfTscSize 00b 4O bits 
O1ib 48 bits 
10b 56 bits 
11b 64 bits 


APIC ID size. The number of bits in the initial APIC20[Apicld] value that indicate 
logical processor ID within a package. The size of this field determines the 
maximum number of logical processors (MNLP) that the package could 
theoretically support, and not the actual number of logical processors that are 
implemented or enabled in the package, as indicated by CPUID 
Fn8000_0008_ECX[NC]. A value of zero indicates that legacy methods must be 
used to determine the maximum number of logical processors, as indicated by 
CPUID Fn8000_0008_ECX[NC]. 
15:12 |ApicldSize 
if (ApicldSize[3:0] == 0) { 

/! Used by legacy dual-core/single-core processors 

MNLP = CPUID Fn8000_0008_ECX[NC] + 1; 
}else { 

// use ApicldSize[3:0] field 

MNLP = (2 raised to the power of ApicldSize[3:0)); 


} 
11:8 | — Reserved. 
7-0 NT Number of physical threads - 1. The number of threads in the processor is NT+1 


(e.g., if NT = 0, then there is one thread). See “Legacy Method” on page 633. 


CPUID Fn8000_0008_EDX RDPRU Register Identifier Range 


The value returned in EDX identifies the maximum recognized register identifier for the RDPRU 
instruction. 


Bits |Field Name Description 
63:32 | — Reserved. 
31:16 |MaxRdprulD The maximum ECX value recognized by RDPRU. 
15:0 | InvipgbCountMax |Maximum page count for INVLPGB instruction. 
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E.4.8 Function 8000_0009h—Reserved 
CPUID Fn8000_ 0009 Reserved 

This function is reserved. 

E.4.9 Function 8000_000Ah—SVM Features 


This function provides information about the SVM features that the processory supports. If SVM is 
not supported (CPUID Fn8000_0001_ECX[SVM] = 0), this function is reserved. 


CPUID Fn8s000_000A_EAX SVM Revision and Feature Identification 


The value returned in EAX provides the SVM revision number. I 


Bits |Field Name Description 
31:8 | — Reserved. 
7:0 |SvmRev SVM revision number. 


CPUID Fn8s000_000A_EBX SVM Revision and Feature Identification 


The value returned in EBX provides the number of address space identifiers (ASIDs) that the 
processor supports. 


Bits |Field Name Description 
31:0 JNASID Number of available address space identifiers (ASID). 


CPUID Fn8000_000A_ECX Reserved 


The value returned in ECX for this function is undefined and is reserved. 


CPUID Fn8000_000A_EDX SVM Feature Identification 


The value returned in EDX provides Secure Virtual Machine architecture feature information. All 
cross references in the table below are to sections within the Secure Virtual Machine chapter of APM2. 


Bits |Field Name Description 
31:25 | — Reserved. 


Support for INVLPGB/TLBSYNC hypervisor enable in VMCB and 
TLBSYNC intercept. 

When host CR4.MCE=1 and guest CR4.MCE=0, machine check 

23 |HOST_MCE_OVERRIDE Jexceptions (#MC) in a guest do not cause shutdown and are always 
intercepted. 


22:21 |— Reserved. 


24 |TIbiCtl 
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Bits |Field Name Description 

20 |SpecCtrl SPEC_CTRL virtualization. 

19 |ssscheck SVM Supervisor shadow stack restrictions. See “Supervisor Shadow Stack 
Restrictions” in Volume 2. 

18 | — Reserved. 

17 |GMET Guest Mode Execution Trap. 

16 |VGIF Virtualize the Global Interrupt Flag. See "Nested Virtualization" 

15 |VMSAVEvirt VMSAVE and VMLOAD virtualization. See "Nested Virtualization" 

14 |— Reserved. 
Support for the AMD advanced virtual interrupt controller. See “Advanced 

13 |AVIC ; 
Virtual Interrupt Controller. 

PAUSE filter threshold. Indicates support for the PAUSE filter cycle count 

Pee ener et Tisenare threshold. See "Pause Intercept Filtering” in Volume 2. 

11 | — Reserved. 

10 |PauseFilter Pause intercept filter. Indicates support for the pause intercept filter. See 
Pause Intercept Filtering. 

9:38 | — Reserved. 

2 \pdeedaassicts Decode assists. Indicates support for the decode assists. See “Decode 


Assists.” 


Flush by ASID. Indicates that TLB flush events, including CR3 writes and 


6 |FlushByAsid CR4.PGE toggles, flush only the current ASID's TLB entries. Also indicates 
support for the extended VMCB TLB_ Control. See “TLB Control.” 

5 |vmebClean VMCB clean bits. Indicates support for VMCB clean bits. See “VMCB 
Clean Bits. 

4 |TscRateMsr MSR based TSC rate control. Indicates support for MSR TSC ratio 
MSRCO000_0104. See “TSC Ratio MSR (C000_0104h).” 
NRIP save. Indicates support for NRIP save on #VMEXIT. See “State 

3  |NRIPS in 
Saved on Exit. 

2 |SVML SVM lock. Indicates support for SVM-Lock. See “Enabling SVM.” 

LBR virtualization. Indicates support for LBR Virtualization. See “Enabling 

1 = |LbrVirt . Dik oat 
LBR Virtualization. 

O |NP Nested paging. Indicates support for nested paging. See “Nested Paging.” 


E.4.10 Functions 8000 OO0OBh-8000 0018h—Reserved 


CPUID Fn8000_00[18:0B] Reserved 


These functions are reserved. 
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E.4.11 Function 8000_0019h—TLB Characteristics for 1GB pages 


This function provides information about the TLB for 1 GB pages for the processor that executes the 
instruction. 


CPUID Fn8000_0019_EAX L1TLB 1G Information 


The value returned in EAX provides information about the L1 TLB for 1 GB pages. 


Bits |Field Name Description 

31:28 |LIDTIb1GAssoc |L1 data TLB associativity for 1 GB pages. See Table E-4 on page 615. 
27:16 |LIDTIb1GSize L1 data TLB number of entries for 1 GB pages. 

15:12 |L1ITIb1GAssoc L1 instruction TLB associativity for 1 GB pages. See Table E-4 on page 615. 
11:0 |L1ITIb1GSize L1 instruction TLB number of entries for 1 GB pages. 


CPUID Fn8000_0019 EBX L2 TLB 1G Information 


The value returned in EBX provides information about the L2 TLB for 1 GB pages. 


Bits |Field Name Description 

31:28 |L2DTlb1GAssoc |L2 data TLB associativity for 1 GB pages. See Table E-4 on page 615. 
27:16 |L2DTIb1GSize L2 data TLB number of entries for 1 GB pages. 

15:12 |L2ITIb1GAssoc L2 instruction TLB associativity for 1 GB pages. See Table E-4 on page 615. 
11:0 /L2ITIb1GSize L2 instruction TLB number of entries for 1 GB pages. 


CPUID Fn8000_0019 E[D,C]X Reserved 


The values returned in ECX and EDX for this function are undefined and reserved for future use. 


E.4.12 Function 8000_001Ah—Instruction Optimizations 


CPUID Fn8000_001A_EAX Performance Optimization Identifiers 


This function returns performance related information. For more details on how to use these bits to optimize 
software, see the Software Optimization Guide applicable to your product. 


Bits |Field Name Description 
31:3 | — Reserved. 
2  |FP256 The internal FP/SIMD execution datapath is 256 bits wide. 
MOVU SSE nstructions are more efficient and should be preferred to SSE 
1 |MOVU MOVL/MOVH. MOVUPS is more efficient than MOVLPS/MOVHPS. MOVUPD is 
more efficient than MOVLPD/MOVHPD. 
O |FP128 The internal FP/SIMD execution datapath is 128 bits wide. 
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CPUID Fn8000_001A_E[D,C,B]X Reserved 


The values returned in EBX, ECX, and EDX are undefined for this function and are reserved. 
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E.4.13 Function 8000_001Bh—Instruction-Based Sampling Capabilities 


If instruction-based sampling (IBS) is supported (CPUID Fn8000_0001_ECX[IBS] = 1), this CPUID 
function can be used to obtain IBS feature information. If IBS is not supported (CPUID 
Fn8000_0001_ECX[IBS] = 0), this function number is reserved. For more information on using IBS, 
see “Instruction-Based Sampling” in APM2. 


CPUID Fn8000_001B_EAX Instruction-Based Sampling Feature Indicators 


The value returned in EAX provides the following information about the specific features of IBS that 
the processor supports: 


Bits |Field Name Description 

31:9 Reserved. 
8 |OpBrnFuse Fused branch micro-op indication supported. 
7 |RipInvalidChk Invalid RIP indication supported. 
6 |OpCntExt IbsOpCurCnt and IbsOpMaxCnt extend by 7 bits. 
5 |BrnTrgt Branch target address reporting supported. 
4 |OpCnt Op counting mode supported. 
3 |RdWrOpCnt Read write of op counter supported. 
2 |OpSam IBS execution sampling supported. 
1 |FetchSam IBS fetch sampling supported. 
0 |IBSFFV IBS feature flags valid. 


CPUID Fn8000_001B _E[D,C,B]X Reserved 


The values returned in EBX, ECX, and EDX are undefined and are reserved. 


E.4.14 Function 8000_001Ch—Lightweight Profiling Capabilities 


If lightweight profilling (LWP) is supported (CPUID Fn8000_0001_ECX[LWP] = 1), this CPUID 
function can be used to obtain information about LWP features supported by the processor. If LWP is 
not supported (CPUID Fn8000_0001_ECX[LWP] = 0), this function number is reserved. For more 
information on using LWP, see “Lightweight Profiling” in APM2. 
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CPUID Fn8000_001C_EAX Lightweight Profiling Capabilities 0 


The value returned in EAX provides the following information about LWP capabilities supported by 
the processor: 


Bits |Field Name Description 
31 |Lwplnt Interrupt on threshold overflow available. 
30 |LwpPTSC Performance time stamp counter in event record is available. 
29 |LwpCont Sampling in continuous mode is available. 
28:7 | — Reserved. 
6 |LwpRNH Core reference clocks not halted event available. 
5 |LwpCNH Core clocks not halted event available. 
4 |LwpDME DC miss event available. 
3 |LwpBRE Branch retired event available. 
2  |LwplIRE Instructions retired event available. 
1 |LwpVAL LWPVAL instruction available. 
0 |LwpAvail The LWP feature is available. 


CPUID Fn8000_001C_EBX Lightweight Profiling Capabilities 0 


The value returned in EBX provides the following additional information about LWP capabilities 
supported by the processor: 


Bits |Field Name Description 

31:24|LwpEventOffset Offset in bytes from the start of the LWPCB to the Eventinterval! field. 

23:16 |_LwpMaxEvents Maximum Eventld value supported. 

15:8 |LwpEventSize Event record size. Size in bytes of an event record in the LWP event ring buffer. 
7:0 |LwpCbSize Control block size. Size in quadwords of the LWPCB. 


CPUID Fn8000_001C_ECX Lightweight Profiling Capabilities 0 


The value returned in ECX provides the following additional information about LWP capabilities 
supported by the processor: 


Bits |Field Name Description 
31 |LwpCacheLatency |Cache latency filtering supported. Cache-related events can be filtered by latency. 
30 |LwpCacheLevels Cache level filtering supported. Cache-related events can be filtered by the cache 
level that returned the data. 
29 |LwplpFiltering IP filtering supported. 
LwpBranchPredict |Branch prediction filtering supported. Branches Retired events can be filtered 
28 Os, 
ion based on whether the branch was predicted properly. 
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Bits |Field Name Description 
27:24 | — Reserved. 
23:16 |LwpMinBufferSize — tee size. Minimum size of the LWP event ring buffer, in units of 32 
15:9 |LwpVersion Version of LWP implementation. 
8:6 |LwpLatencyRnd /Amount by which cache latency is rounded. 
5  |LwpDataAddress |Data cache miss address valid. Address is valid for cache miss event records. 
4:0 |LwpLatencyMax  |Latency counter size. Size in bits of the cache latency counters. 


CPUID Fn8000_001C_EDX Lightweight Profiling Capabilities 0 


The value returned in EDX provides the following additional information about LWP capabilities 
supported by the processor: 


Bits |Field Name Description 
31 ‘|Lwplnt Interrupt on threshold overflow supported. 
30 |LwpPTSC Performance time stamp counter in event record is supported. 
29 _ |LwpCont Sampling in continuous mode is supported. 
28:7 | — Reserved. 
6  |LwpRNH Core reference clocks not halted event is supported. 
5 |LwpCNH Core clocks not halted event is supported. 
4 |LwpDME DC miss event is supported. 
3. |LwpBRE Branch retired event is supported. 
2 |LwpIRE Instructions retired event is supported. 
1 |LwpVAL LWPVAL instruction is supported. 
O  |LwpAvail Lightweight profiling is supported. 


E.4.15 Function 8000_001Dh—Cache Topology Information 


CPUID Fn8000_001D_E[D,C,B,A]X reports cache topology information for the cache enumerated by 
the value passed to the instruction in ECX, referred to as Cache n in the following description. To 
gather information for all cache levels, software must repeatedly execute CPUID with 8000_001Dh in 
EAX and ECX< set to increasing values beginning with 0 until a value of 00h is returned in the field 
CacheType (EAX[4:0]) indicating no more cache descriptions are available for this processor. 


If CPUID Fn8000_0001_ECX[TopologyExtensions] = 0, then CPUID Fn8000_001Dh is reserved. 
Any value in ECX which does not select an existing cache will return a Null cache type in EAX[4:0]. 
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CPUID Fn8000_001D_EAX_x[N:0] Cache Properties 


Bits |Field Name Description 
31:26) — Reserved. 
Specifies the number of logical processors sharing the cache enumerated by N, 
the value passed to the instruction in ECX. The number of logical processors 
sharing this cache is the value of this field incremented by 1. To determine which 
logical processors are sharing a cache, determine a Share Id for each processor 
as follows: 
25:14 |NumSharingCache 
Shareld = LocalApicld >> log2(NumSharingCache+1) 
Logical processors with the same Shareld then share a cache. If 
NumSharingCache+1 is not a power of two, round it up to the next power of two. 
13:10) — Reserved. 
oa Fully associative cache. When set, indicates that the cache is fully associative. If 
9 |FullyAssociative aan ae 
O is returned in this field, the cache is set associative. 
Self-initializing cache. When set, indicates that the cache is self initializing; 
8 |Selflnitialization software initialization not required. If 0 is returned in this field, hardware does not 
initialize this cache. 
Cache level. Identifies the level of this cache. Note that the enumeration value is 
not necessarily equal to the cache level. 
Bits Description 
oe Neecaeinees| 000b Reserved. 
neo 001b Level 1 
010b Level 2 
011b Level 3 
111b-100b Reserved. 
Cache type. Identifies the type of cache. 
Bits Description 
OOh Null; no more caches. 
4:0 |CacheType O1h Data cache 
02h Instruction cache 
03h Unified cache 
1Fh-04h Reserved. 
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CPUID Fn8000_001D_EBX_x[N:0] Cache Properties 
See CPUID Fn8000_001D_EAX _ x[N:0]. 


Bits |Field Name Description 


Number of ways for this cache. The number of ways is the value returned in this 
field incremented by 1. 


Number of physical line partitions. The number of physical line partitions is the 
value returned in this field incremented by 1. 


31:22 |CacheNumWays 


21:12 |CachePhysPartitions 


Cache line size. The cache line size in bytes is the value returned in this field 


11:0 |CacheLineSize incremented by 1. 


CPUID Fn8000_001D_ECX_x[N:0] Cache Properties 
See CPUID Fn8000_001D_EAX_ x[N:0]. 


Bits |Field Name Description 


Number of ways for set associative cache. Number of ways is the value returned in 
31:0 j|CacheNumSets _ |this field incremented by 1. Only valid for caches that are not fully associative 
(Fn8000_001D_EAX_xn[FullyAssociative] = 0). 


CPUID Fn8000_001D_EDX_x[N:0] Cache Properties 
See CPUID Fn8000_001D_EAX_ x[N:0]. 


Bits |Field Name Description 
31:2 | — Reserved. 


Cache inclusivity. A value of 0 indicates that this cache is not inclusive of lower 
1 |Cachelnclusive cache levels. A value of 1 indicates that the cache is inclusive of lower cache 
levels. 


Write-Back Invalidate/Invalidate execution scope. A value of 0 returned in this field 
indicates that the WBINVD/INVD instruction invalidates all lower level caches of 

0 |WBINVD non-originating logical processors sharing this cache. When set, this field indicates 
that the WBINVD/INVD instruction is not guaranteed to invalidate all lower level 
caches of non-originating logical processors sharing this cache. 


E.4.16 Function 8000_001Eh—Processor Topology Information 
CPUID Fn8000_001E_EAX Extended APIC ID 


If CPUID Fn8000_0001_ECX[TopologyExtensions] = 0, this function number is reserved. 


Bits |Field Name Description 
31:0 jExtendedApicld |Extended APIC ID. If MSROOOO_001B/ApicEn] = 0, this field is reserved.. 
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CPUID Fn8000_001E_ EBX Compute Unit Identifiers 


See CPUID Fn8000_001E_EAX. 


Bits |Field Name Description 

31:16} — Reserved. 
Threads per compute unit (zero-based count). The actual number of threads 
per compute unit is the value of this field + 1. To determine which logical 
processors (threads) belong to a given Compute Unit, determine a Shareld 
for each processor as follows: 

Toe | EateadePerComputenlt Shareld = LocalApicld >> log2(ThreadsPerComputeUnit+1) 
Logical processors with the same Shareld then belong to the same Compute 
Unit. (If ThreadsPerComputeUnit+1 is not a power of two, round it up to the 
next power of two). 

7:0 |Computeunitid Compute unit ID. Identifies a Compute Unit, which may be one or more 

physical cores that each implement one or more logical processors. 


CPUID Fn8s000_001E ECX Node Identifiers 


See CPUID Fn8000_001E_EAX. 


Bits |Field Name Description 
31:0 | — Reserved. 
Specifies the number of nodes in the package/socket in which this logical 
10:8 |NodesPerProcessor /processor resides. Node in this context corresponds to a processor die. 
Encoding is N-1, where N is the number of nodes present in the socket. 
: Specifies the ID of the node containing the current logical processor. Nodeld 
7:0 |Nodeld ; 
values are unique across the system.. 


CPUID Fn8000_001E_ EDX Reserved 


The value returned in EDX is undefined and is reserved. 


E.4.17 Function 8000_001Fh—Encrypted Memory Capabilities 


CPUID Fn8000_001F_EAX Secure Encryption 


Bits |Field Name Description 
31:25; — Reserved. 
24 |VmsaRegProt VMSA Register Protection supported 
23:17| — Reserved. 
16 |VTE Virtual Transparent Encryption supported 
15 |PreventHostlbs Disallowing IBS use by the host supported 
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Bits |Field Name Description 

14 |DebugSwap Full debug state swap supported for SEV-ES guests 
13 |Alternatelnjection Alternate Injection supported 

12 |RestrictedInjection Restricted Injection supported 

11 |64BitHost SEV guest execution only allowed from a 64-bit host 
10 |HwEnfCacheCoh Hardware cache coherency across encryption domains enforced 

9 — Reserved. 

8 |SecureTsc Secure TSC supported 

7:6 | — Reserved. 

5 |VMPL VM Permission Levels supported 

4 |SEV-SNP SEV Secure Nested Paging supported 

3 |SEV-ES SEV Encrypted State supported 

2  |PageFlushMsr Page Flush MSR available 

1 |SEV Secure Encrypted Virtualization supported 

0 |SME Secure Memory Encryption supported 


CPUID Fn8000_001F_EBX Secure Encryption 


Bits |Field Name Description 
31:16) — Reserved. 
15:12 |NumVMPL Number of VM Permission Levels supported 
11:6 |PhysAddrReduction Physical Address bit reduction 
5:0 |CbitPosition C-bit location in page table entry 


CPUID Fn8000_001F_ECX Secure Encryption 


Bits 


Field Name 


Description 


31:0 


NumEncryptedGuests 


Number of encrypted guests supported simultaneously 


CPUID Fn8000_001F_EDX Minimum ASID 


Bits 


Field Name 


Description 


31:0 


MinSevNoEsAsid 


Minimum ASID value for an SEV enabled, SEV-ES disabled guest 
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E.4.18 Function 8000 0020—Reserved 


E.4.19 Function 8000 0021—Extended Feature Identification 2 


CPUID Fn8000_0021 EAX Extended Feature 2 


Bits |Field Name Description 
31:14 |— Reserved 


Prefetch control MSR supported. See Core::X86::Msr::PrefetchControl in 
BKDG or PPR for details 


13 |PrefetchCtlMsr 


12:10 |— Reserved 
NoSmmCtIMSR SMM_CTL MSR (C001_0116h) is not supported. 
— Reserved 
UpperAddresslgnore Upper Address Ignore is supported. 


Null segment selector loads also clear the destination segment register 


6 |NullSelectClearsBase ae 
base and limit. 


5:4 |— Reserved 
3 |SmmPgCfgLock SMM paging configuration lock supported. 
2  |LFenceAlwaysSerializing jLFENCE is always dispatch serializing. 
1 j— Reserved 
0 |NoNestedDataBp Processor ignores nested data breakpoints 


CPUID Fn8s000_0021 EBX Extended Feature 2 


Bits |Field Name Description 
31:12 |— Reserved. 


The size of the Microcode patch in 16-byte multiples. If 0, the size of the 
patch is at most 5568 (15COh) bytes. 


11:0 |MicrocodePatchSize 


CPUID Fn8000_0021_E[C,D]X Reserved 


The values returned in ECX and EDX are undefined and are reserved. 


CPUID Fn8000_ 0022 EAX Reserved 


Bits |Field Name Description 


31:0 |— Reserved 
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CPUID Fn8000_0022_EBX Extended Performance Monitoring and Debug 


Bits |Field Name Description 

31:16 |— Reserved 

15:10 |NumDfPmc Number of available Northbridge Performance Monitor Counters 
9:4 |— Reserved 

3:0 |NumCorePmc Number of Core Performance Counters 


CPUID Fn8000_0022 E[C,D]X Reserved 


The values returned in ECX and EDX are undefined and are reserved. 


E.5 Multiple Processor Calculation 


Operating systems may use one of two possible methods to calculate the actual number of logical processors 
per package (NC), and the maximum possible number of logical processors per package (MNLP). The 
extended method is recommended, but a legacy method is also available. 


E.5.1 Legacy Method 
The CPUID identification of total number of logical processors per package is derived from information 
returned by the following fields: 


* CPUID Fn0000_0001_EBX[LogicalProcessorCount] 

* CPUID Fn0000_0001_EDX[HTT] (Hyper-Threading Technology) 
* CPUID Fn8000_0001_ECX[CmpLegacy] 

* CPUID Fn8000_0008_ECX[NC] 


Table E-5 defines LogicalProcessorCount, HTT, CmpLegacy, and NC as a function of the number of 
logical processors per package (n). 


When HTT = 0, LogicalProcessorCount is reserved and the package contains one logical processor. 


When HTT = 1 and CmpLegacy = 1, LogicalProcessorCount represents the number of logical processors per 
package (n). 


Table E-5. LogicalProcessorCount, CmpLegacy, HTT, and NC 


Logical Processors per package |CmpLegacy HTT |LogicalProcessorCount NC 
1 0 0 Reserved 0 
2 or more 1 1 n n-1 


The use of CmpLegacy and LogicalProcessorCount for determining the number of logical processors is depre- 
cated. Instead, use NC to determine the number of logical processors per package. 


Obtaining Processor Information VANES Pine tree) 633 


AMDd¢1 
AMD64 Technology 24594—Rev. 3.33—November 2021 


E.5.2 Extended Method (Recommended) 

The CPUID identification of total number of logical processors per package is derived from information 
returned by the CPUID Fn8000_0008_ECX[ApicIdSize[3:0]]. This field indicates the number of least signifi- 
cant bits in the CPUID Fn0000_0001_EBX[LocalApicld] that indicates logical processor ID within the pack- 
age. The size of this field determines the maximum number of logical processors (MNLP) that the package 
could theoretically support, and not the actual number of logical processors that are implemented or enabled in 
the package, as indicated by CPUID Fn8000_0008_ECX[NC]. 


A value of zero for ApicIdSize[3:0] indicates that the legacy method (section E5.1) should be used to derive 
the maximum number of logical processors: 


MNLP = CPUID Fn8000_0008_ECX[NC] + 1. 
And for non-zero values of ApicIdSize[3:0]: 


MNLP = 2 raised to the power of ApicldSize[3:0] 
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Appendix F_ Instruction Effects on RFLAGS 


The flags in the RFLAGS register are described in “Flags Register” in Volume 1 and “RFLAGS 
Register” in Volume 2. Table F-1 summarizes the effect that instructions have on these flags. The table 
includes all instructions that affect the flags. Instructions not shown have no effect on RFLAGS. 


The following codes are used within the table: 


0—The flag is always cleared to 0. 

1—The flag is always set to 1. 

AH—The flag is loaded with value from AH register. 

Mod—tThe flag is modified, depending on the results of the instruction. 
Pop—the flag is loaded with value popped off of the stack. 

Tst—The flag is tested. 

U—the effect on the flag is undefined. 

Gray shaded cells indicate that the flag is not affected by the instruction. 


Table F-1. Instruction Effects on RFLAGS 


. RFLAGS Mnemonic and Bit Number 
pias ID | VIP | VIF | AC | VM | RF | NT |IOPL| OF | DF | IF | TF | SF | ZF | AF | PF | CF 
21 | 20 | 19 | 18 | 17 | 16 | 14 |13:12| 11 | 10 9 8 7 6 4 2 0 
AAA Tst 
AAS U U U Mod U | Mod 
free U Mod|Mod} U |Mod| U 
ADC Mod Mod | Mod | Mod | Mod | , ‘St 
Mod 
ADD Mod Mod | Mod | Mod | Mod | Mod 
AND 0 Mod |Mod} U |Mod} 0 
ARPL Mod 
8 U U |Mod]| U U U 
BT 
BTC 
BTR U U U U U | Mod 
BTS 
BZHI 0 Mod | Mod) U U | Mod 
CLC 0 
CLD 0 
CLI Mod TST Mod 
CMC Mod 
CMOVcc Tst Tst | Tst Tst | Tst 
CMP Mod Mod | Mod | Mod | Mod | Mod 
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Table F-1. Instruction Effects on RFLAGS (continued) 


; RFLAGS Mnemonic and Bit Number 
place ID | VIP | VIF | AC | VM | RF | NT |IOPL| OF | DF | IF | TF | SF | ZF | AF | PF | CF 
21 | 20 | 19 | 18 | 17 | 16 | 14 |13:12| 11 | 10 9 8 7 6 4 2 0 
CMPSx Mod | Tst Mod | Mod | Mod | Mod | Mod 
CMPXCHG Mod Mod | Mod | Mod | Mod | Mod 
CMPXCHG8B Mod 
CMPXCHG16B Mod 
eee 0 0 |Mod}| O | Mod | Mod 
ae U Mod | Mod | 7°", | Mod | Jot 
DEC Mod Mod | Mod | Mod | Mod 
DIV U U U U U U 
FCMOVcc Tst Tst | Tst 
FCOMI 
Bate Mod Mod | Mod 
FUCOMIP 
IDIV U U U U U U 
IMUL Mod U U U U | Mod 
INC Mod Mod | Mod | Mod | Mod 
IN Tst 
INSx Tst Tst 
he Mod | Mod ae 0 |Mod] Tst Mod| 0 
INTO mod|.'S¢} 9 |Mod| Tst | Tst Mod | Mod 
Mod 
IRETX Pop | Pop | Pop | Pop as Pop hie Pop | Pop | Pop | Pop | Pop | Pop | Pop | Pop | Pop 
Jcc Tst Tst | Tst Tst | Tst 
LAR Mod 
LODSx Tst 
LOOPE 
LOOPNE is 
LSL Mod 
LZCNT U U |Mod]| U U | Mod 
MOVSx Tst 
MUL Mod U U U U | Mod 
NEG Mod Mod | Mod | Mod | Mod | Mod 
OR 0 Mod |Mod} U |Mod} 0 
OUT Tst 
OUTSx Tst Tst 
PSMASH Mod Mod | Mod | Mod | Mod 
PVALIDATE Mod Mod | Mod | Mod | Mod | Mod 
POPCNT 0 QO |Mod} O 0 0 
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Table F-1. Instruction Effects on RFLAGS (continued) 


; RFLAGS Mnemonic and Bit Number 
spices ID | VIP | VIF | AC | VM | RF | NT |IOPL| OF | DF | IF | TF | SF | ZF | AF | PF | CF 
21 | 20 | 19 | 18 | 17 | 16 | 14 |13:12]| 11 | 10 9 8 7 6 4 2 0 
POPFx Pop | Tst |Mod)| Pop} Tst | O | Pop a Pop | Pop | Pop | Pop | Pop | Pop | Pop | Pop | Pop 
Tst 
RCL1 Mod Mod 
Tst 
RCL count U Mod 
Tst 
RCR 1 Mod Mod 
Tst 
RCR count U Mod 
RMPADJUST Mod Mod | Mod | Mod | Mod 
RMPUPDATE Mod Mod | Mod | Mod | Mod 
ROL 1 Mod Mod 
ROL count U Mod 
ROR 1 Mod Mod 
ROR count U Mod 
RSM Mod | Mod | Mod | Mod | Mod |Mod | Mod | Mod | Mod | Mod | Mod | Mod | Mod | Mad | Mod | Mod | Mod 
SAHF AH | AH | AH | AH | AH 
SHL/SAL 1 Mod Mod | Mod} U_ | Mod | Mod 
SHL/SAL count U Mod | Mod} U_ | Mod | Mod 
SAR 1 Mod Mod | Mod} U_ | Mod | Mod 
SAR count U Mod | Mod} U_ | Mod | Mod 
SBB Mod Mod | Mod | Mod | Mod nay 
SCASx Mod | Tst Mod | Mod | Mod | Mod | Mod 
SETcc Tst Tst | Tst Tst | Tst 
aie : Mod Mod |Mod} U_ | Mod | Mod 
eee pn U Mod |Mod| U_ | Mod | Mod 
SHR 1 Mod Mod |Mod} U_ | Mod | Mod 
SHR count U Mod | Mod} U_ | Mod | Mod 
STC 1 
STD 1 
STI Mod Tst Mod 
STOSx Tst 
SUB Mod Mod | Mod | Mod | Mod | Mod 
SYSCALL Mod |Mod}Mod|Mod} 0 O |Mod} Mod | Mod | Mod | Mod | Mod | Mod | Mod | Mod | Mod | Mod 
SYSENTER 0 0 0) 
SYSRET Mod | Mod | Mod | Mod O |Mod} Mod | Mod | Mod | Mod | Mod | Mod | Mod | Mod | Mod | Mod 
TEST 0) Mod |Mod} U |Mod} 0 
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Instruction Effects on RFLAGS (continued) 


Table F-1. 
; RFLAGS Mnemonic and Bit Number 
Instruction 
Mnemonic ID | VIP | VIF | AC | VM | RE | NT /IOPL| OF | DF | IF | TF | SF | ZF | AF | PF | CF 
21 | 20 | 19 | 18 | 17 | 16 | 14 |13:12| 11 | 10 9 8 7 6 4 2 0 
UCOMISD 
UCOMISS 0 O |Mod]| O | Mod | Mod 
VERR 
VERW Moe 
XADD Mod Mod | Mod | Mod | Mod | Mod 
XOR 0 Mod | Mod} U |Mod} O 
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